git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Limited local file inclusion with .mailmap symlinks and git-archive
@ 2021-02-13 17:49 Blake Burkhart
  2021-02-15 23:17 ` Jeff King
  0 siblings, 1 reply; 25+ messages in thread
From: Blake Burkhart @ 2021-02-13 17:49 UTC (permalink / raw)
  To: git, Jeff King

Git's mailmap implementation first tries using the blob from the
repository, but also supports using a local (possibly uncommitted)
.mailmap file. When reading from the local file, git will follow
symlinks. If a symlink is committed to a repository named .mailmap,
git will parse the file on the other side of the symlink if the
repository is cloned locally.

Git log supports an %aN placeholder which prints the result of the
mailmap, if it is possible for this value to be sent to an attacker
this could become a local file inclusion concern. With git-archive it
is possible to use $Format:%aN$ to include this value in an exported
archive.

Running git on bare repos or using git archive --remote=... is
unaffected because a local file is never used, only the in-repo blob.

Git's mailmap parser is very forgiving, it reads in each line, skips
lines starting with #, then considers whatever it finds between < and
> as the email address. It is even possible to use binary files as a
.mailmap. As a demonstration I used a symlink to /proc/self/exe (which
itself is a symlink to /usr/bin/git). The string [--exec-path[= was
extracted from the binary as the author name.

git init mailmap
cd mailmap
ln -s /proc/self/exe .mailmap
echo "test export-subst" > .gitattributes
echo '$Format:%aN$' > test
git add .mailmap .gitattributes test
git commit -m "test" --author="foo <path>"
cd ..

# Pretend you're cloning from the internet...
git clone mailmap mailmap-clone
cd mailmap-clone
git archive --format=tar HEAD
# Output contains [--exec-path[=

These are unaffected:

cd ..
git --git-dir=mailmap/.git archive --format=tar HEAD
git archive --remote=git://localhost/ --format=tar HEAD

I reported this issue to the private security list first and discussed
this issue with Peff. This is similar to existing concerns with
.gitmodules, .gitattributes and .gitignore. Git already disallows
checking out a .gitmodules file from a repository, and I understand
there are in progress patches to add similar protection for
.gitattributes and .gitignore. Please ensure the .mailmap file gets
similar symlink protection.

Exploitability is limited because the targeted file must contain a
string formatted like <foo> known to the attacker, or attacker
controlled. Also, most automated build systems that checkout code are
sandboxed and prepared to run arbitrary code already (it is
technically possible to read potentially sensitive variables from
/proc/self/environ with this, but is very limited because it contains
null bytes).

--
Blake Burkhart

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Limited local file inclusion with .mailmap symlinks and git-archive
  2021-02-13 17:49 Limited local file inclusion with .mailmap symlinks and git-archive Blake Burkhart
@ 2021-02-15 23:17 ` Jeff King
  2021-02-15 23:18   ` [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning Jeff King
  2021-02-15 23:19   ` [PATCH 2/2] disallow symlinked .mailmap files Jeff King
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff King @ 2021-02-15 23:17 UTC (permalink / raw)
  To: Blake Burkhart; +Cc: Junio C Hamano, git

On Sat, Feb 13, 2021 at 11:49:32AM -0600, Blake Burkhart wrote:

> I reported this issue to the private security list first and discussed
> this issue with Peff. This is similar to existing concerns with
> .gitmodules, .gitattributes and .gitignore. Git already disallows
> checking out a .gitmodules file from a repository, and I understand
> there are in progress patches to add similar protection for
> .gitattributes and .gitignore. Please ensure the .mailmap file gets
> similar symlink protection.

Thanks again for bringing this up.

Here are some patches that I think will help. They're meant to be
applied on the stalled jk/symlinked-dotgitx-files topic, which Junio has
been carrying in "seen" for a few months now.

The sticking point there was that we were concerned that the fsck checks
for .gitattributes/.gitignore would catch historical commits in real
projects, making them annoying to work with. So the first patch here
loosens those checks to warnings. I think this is safe enough, as the
real protection is in preventing checkouts in the index code paths (the
fsck checks are really just about protecting other clients using older
versions, but the severity of these attacks is so low that the tradeoff
doesn't make as much sense).

Obviously this could be squashed into the earlier patches, but I think
documenting the change of direction with a separate commit makes sense.

And then the second patch adds similar .mailmap support (also as a
warning, since I think it is largely in the same boat, and it makes
sense to be consistent).

  [1/2]: fsck: make symlinked .gitignore and .gitattributes a warning
  [2/2]: disallow symlinked .mailmap files

 cache.h                      |  1 +
 fsck.c                       | 10 ++++++++--
 path.c                       |  5 +++++
 read-cache.c                 |  6 ++++--
 t/helper/test-path-utils.c   |  5 +++++
 t/t0060-path-utils.sh        | 10 ++++++++++
 t/t7450-bad-dotgitx-files.sh | 26 +++++++++++++++++++-------
 utf8.c                       |  5 +++++
 utf8.h                       |  1 +
 9 files changed, 58 insertions(+), 11 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning
  2021-02-15 23:17 ` Jeff King
@ 2021-02-15 23:18   ` Jeff King
  2021-02-16  0:38     ` Ævar Arnfjörð Bjarmason
  2021-02-15 23:19   ` [PATCH 2/2] disallow symlinked .mailmap files Jeff King
  1 sibling, 1 reply; 25+ messages in thread
From: Jeff King @ 2021-02-15 23:18 UTC (permalink / raw)
  To: Blake Burkhart; +Cc: Junio C Hamano, git

We recently added fsck checks to complain about symlinked .gitignore and
.gitattributes files, which are no longer allowed to be checked out.
This is partially to inform fsck users of the problem, but also to
protect older clients from receiving them (by blocking push and fetch
via transfer.fsckObjects).

While there are some minor security implications to having these files
be symlinks, this is out-weighed by the inconvenience of blocking
historical commits in some projects that might include them.

Let's loosen the fsck check to a warning. It will continue to be
reported by both git-fsck and transfer.fsckObjects, but will not impact
the exit code or the acceptance of objects. Note that internally in
fsck.c this is called "INFO", but the word "warning" will appear in
user-visible output.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                       |  4 ++--
 t/t7450-bad-dotgitx-files.sh | 26 ++++++++++++++++++--------
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fsck.c b/fsck.c
index d0a201348d..c75c7d7dc7 100644
--- a/fsck.c
+++ b/fsck.c
@@ -67,8 +67,6 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(GITMODULES_URL, ERROR) \
 	FUNC(GITMODULES_PATH, ERROR) \
 	FUNC(GITMODULES_UPDATE, ERROR) \
-	FUNC(GITIGNORE_SYMLINK, ERROR) \
-	FUNC(GITATTRIBUTES_SYMLINK, ERROR) \
 	/* warnings */ \
 	FUNC(BAD_FILEMODE, WARN) \
 	FUNC(EMPTY_NAME, WARN) \
@@ -81,6 +79,8 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(NUL_IN_COMMIT, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(GITIGNORE_SYMLINK, INFO) \
+	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	/* ignored (elevated when requested) */ \
diff --git a/t/t7450-bad-dotgitx-files.sh b/t/t7450-bad-dotgitx-files.sh
index 326b34e167..4b1edb150e 100755
--- a/t/t7450-bad-dotgitx-files.sh
+++ b/t/t7450-bad-dotgitx-files.sh
@@ -140,6 +140,16 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 '
 
 check_forbidden_symlink () {
+	fsck_must_fail=test_must_fail
+	fsck_prefix=error
+	case "$1" in
+	--fsck-warning)
+		fsck_must_fail=
+		fsck_prefix=warning
+		shift
+		;;
+	esac
+
 	name=$1
 	type=$2
 	path=$3
@@ -172,8 +182,8 @@ check_forbidden_symlink () {
 
 			# Check not only that we fail, but that it is due to the
 			# symlink detector
-			test_must_fail git fsck 2>output &&
-			test_i18ngrep "tree $tree: ${name}Symlink" output
+			$fsck_must_fail git fsck 2>output &&
+			test_i18ngrep "$fsck_prefix.*tree $tree: ${name}Symlink" output
 		)
 	'
 
@@ -193,13 +203,13 @@ check_forbidden_symlink gitmodules vanilla .gitmodules
 check_forbidden_symlink gitmodules ntfs ".gitmodules ."
 check_forbidden_symlink gitmodules hfs ".${u200c}gitmodules"
 
-check_forbidden_symlink gitattributes vanilla .gitattributes
-check_forbidden_symlink gitattributes ntfs ".gitattributes ."
-check_forbidden_symlink gitattributes hfs ".${u200c}gitattributes"
+check_forbidden_symlink --fsck-warning gitattributes vanilla .gitattributes
+check_forbidden_symlink --fsck-warning gitattributes ntfs ".gitattributes ."
+check_forbidden_symlink --fsck-warning gitattributes hfs ".${u200c}gitattributes"
 
-check_forbidden_symlink gitignore vanilla .gitignore
-check_forbidden_symlink gitignore ntfs ".gitignore ."
-check_forbidden_symlink gitignore hfs ".${u200c}gitignore"
+check_forbidden_symlink --fsck-warning gitignore vanilla .gitignore
+check_forbidden_symlink --fsck-warning gitignore ntfs ".gitignore ."
+check_forbidden_symlink --fsck-warning gitignore hfs ".${u200c}gitignore"
 
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
-- 
2.30.1.986.gd86016a168


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/2] disallow symlinked .mailmap files
  2021-02-15 23:17 ` Jeff King
  2021-02-15 23:18   ` [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning Jeff King
@ 2021-02-15 23:19   ` Jeff King
  1 sibling, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-15 23:19 UTC (permalink / raw)
  To: Blake Burkhart; +Cc: Junio C Hamano, git

Symlinked .mailmap files do not work as users might expect and open up a
security concern:

  - when we read the files from a blob (e.g., in a bare repository as
    HEAD:.mailmap), we would not respect the symlink (and in fact would
    treat the symlink target as a mailmap entry, which is confusing at
    best).

  - if the symlink points outside of the working tree, this can be used
    to exfiltrate data from a service that can be convinced to check out
    an untrusted repo and run "git shortlog" or similar

Let's give .mailmap the same treatment we gave to .gitignore and
.gitattributes: it will be disallowed for checkout, and produce an fsck
warning (which can be upgraded in severity using config).

Note that we're relying on our ntfs/hfs path-matching infrastructure
here. That was originally designed for .git*, but it should work in this
instance, too (the important thing is that our "needle" path is still
plain ASCII, so we don't need to know the full range of case folds).

The NTFS short-name hash was generated using the perl script included in
e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11). Though
note that I needed to use "%04x" when formatting the final hex, as the
hash value for this file happens to begin with a 0.

Reported-by: Blake Burkhart <bburky@bburky.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h                      |  1 +
 fsck.c                       |  6 ++++++
 path.c                       |  5 +++++
 read-cache.c                 |  6 ++++--
 t/helper/test-path-utils.c   |  5 +++++
 t/t0060-path-utils.sh        | 10 ++++++++++
 t/t7450-bad-dotgitx-files.sh |  4 ++++
 utf8.c                       |  5 +++++
 utf8.h                       |  1 +
 9 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index d928149614..a29084e87c 100644
--- a/cache.h
+++ b/cache.h
@@ -1254,6 +1254,7 @@ int is_ntfs_dotgit(const char *name);
 int is_ntfs_dotgitmodules(const char *name);
 int is_ntfs_dotgitignore(const char *name);
 int is_ntfs_dotgitattributes(const char *name);
+int is_ntfs_dotmailmap(const char *name);
 
 /*
  * Returns true iff "str" could be confused as a command-line option when
diff --git a/fsck.c b/fsck.c
index c75c7d7dc7..860af9cb5b 100644
--- a/fsck.c
+++ b/fsck.c
@@ -81,6 +81,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(GITMODULES_PARSE, INFO) \
 	FUNC(GITIGNORE_SYMLINK, INFO) \
 	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	/* ignored (elevated when requested) */ \
@@ -703,6 +704,11 @@ static int fsck_tree(const struct object_id *tree_oid,
 				retval += report(options, tree_oid, OBJ_TREE,
 						 FSCK_MSG_GITATTRIBUTES_SYMLINK,
 						 ".gitattributes is a symlink");
+			if (is_hfs_dotmailmap(name) ||
+			    is_ntfs_dotmailmap(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_MAILMAP_SYMLINK,
+						 ".mailmap is a symlink");
 		}
 
 		if ((backslash = strchr(name, '\\'))) {
diff --git a/path.c b/path.c
index 7b385e5eb2..2c496d6161 100644
--- a/path.c
+++ b/path.c
@@ -1493,6 +1493,11 @@ int is_ntfs_dotgitattributes(const char *name)
 	return is_ntfs_dot_str(name, "gitattributes", "gi7d29");
 }
 
+int is_ntfs_dotmailmap(const char *name)
+{
+	return is_ntfs_dot_str(name, "mailmap", "maba30");
+}
+
 int looks_like_command_line_option(const char *str)
 {
 	return str && str[0] == '-';
diff --git a/read-cache.c b/read-cache.c
index dd17ebd236..322eddb874 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -984,7 +984,8 @@ int verify_path(const char *path, unsigned mode)
 				if (S_ISLNK(mode)) {
 					if (is_hfs_dotgitmodules(path) ||
 					    is_hfs_dotgitignore(path) ||
-					    is_hfs_dotgitattributes(path))
+					    is_hfs_dotgitattributes(path) ||
+					    is_hfs_dotmailmap(path))
 						return 0;
 				}
 			}
@@ -998,7 +999,8 @@ int verify_path(const char *path, unsigned mode)
 				if (S_ISLNK(mode)) {
 					if (is_ntfs_dotgitmodules(path) ||
 					    is_ntfs_dotgitignore(path) ||
-					    is_ntfs_dotgitattributes(path))
+					    is_ntfs_dotgitattributes(path) ||
+					    is_ntfs_dotmailmap(path))
 						return 0;
 				}
 			}
diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
index 732372b97f..98044d53fa 100644
--- a/t/helper/test-path-utils.c
+++ b/t/helper/test-path-utils.c
@@ -409,6 +409,11 @@ int cmd__path_utils(int argc, const char **argv)
 				     is_hfs_dotgitattributes,
 				     is_ntfs_dotgitattributes);
 	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotmailmap")) {
+		return check_dotgitx("mailmap", argv + 2,
+				     is_hfs_dotmailmap,
+				     is_ntfs_dotmailmap);
+	}
 
 	if (argc > 2 && !strcmp(argv[1], "file-size")) {
 		int res = 0, i;
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index fab077b6f7..de4960783f 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -488,6 +488,16 @@ test_expect_success 'match .gitignore' '
 		GI250A~1
 '
 
+test_expect_success 'match .mailmap' '
+	test-tool path-utils is_dotmailmap \
+		.mailmap \
+		.mail${u200c}map \
+		.Mailmap \
+		.mailmaP \
+		MAILMA~1 \
+		MABA30~1
+'
+
 test_expect_success MINGW 'is_valid_path() on Windows' '
 	test-tool path-utils is_valid_path \
 		win32 \
diff --git a/t/t7450-bad-dotgitx-files.sh b/t/t7450-bad-dotgitx-files.sh
index 4b1edb150e..c12f0373fd 100755
--- a/t/t7450-bad-dotgitx-files.sh
+++ b/t/t7450-bad-dotgitx-files.sh
@@ -211,6 +211,10 @@ check_forbidden_symlink --fsck-warning gitignore vanilla .gitignore
 check_forbidden_symlink --fsck-warning gitignore ntfs ".gitignore ."
 check_forbidden_symlink --fsck-warning gitignore hfs ".${u200c}gitignore"
 
+check_forbidden_symlink --fsck-warning mailmap vanilla .mailmap
+check_forbidden_symlink --fsck-warning mailmap ntfs ".mailmap ."
+check_forbidden_symlink --fsck-warning mailmap hfs ".${u200c}mailmap"
+
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
 	(
diff --git a/utf8.c b/utf8.c
index 5b39361ada..de4ce5c0e6 100644
--- a/utf8.c
+++ b/utf8.c
@@ -777,6 +777,11 @@ int is_hfs_dotgitattributes(const char *path)
 	return is_hfs_dot_str(path, "gitattributes");
 }
 
+int is_hfs_dotmailmap(const char *path)
+{
+	return is_hfs_dot_str(path, "mailmap");
+}
+
 const char utf8_bom[] = "\357\273\277";
 
 int skip_utf8_bom(char **text, size_t len)
diff --git a/utf8.h b/utf8.h
index fcd5167baf..9a16c8679d 100644
--- a/utf8.h
+++ b/utf8.h
@@ -61,6 +61,7 @@ int is_hfs_dotgit(const char *path);
 int is_hfs_dotgitmodules(const char *path);
 int is_hfs_dotgitignore(const char *path);
 int is_hfs_dotgitattributes(const char *path);
+int is_hfs_dotmailmap(const char *path);
 
 typedef enum {
 	ALIGN_LEFT,
-- 
2.30.1.986.gd86016a168

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning
  2021-02-15 23:18   ` [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning Jeff King
@ 2021-02-16  0:38     ` Ævar Arnfjörð Bjarmason
  2021-02-16  1:16       ` Jeff King
  0 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16  0:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Blake Burkhart, Junio C Hamano, git


On Tue, Feb 16 2021, Jeff King wrote:

> While there are some minor security implications to having these files
> be symlinks, this is out-weighed by the inconvenience of blocking
> historical commits in some projects that might include them.

Digging up the relevant thread that's the projects noted at
https://lore.kernel.org/git/20201027033518.GH2645313@google.com/ ?

I cloned the openmrn.git repository noted there, and checkout dies with:

    error: invalid path 'applications/clinic_app/targets/linux.x86/.gitignore'
    fatal: Could not reset index file to revision 'HEAD'.

I'm running a recent-ish snapshot of next at d98b1dd5eaa7, so with your
verify_path() change in current "seen".

So this series changes nothing about the checkout, just the fsck check?

I see there's your
https://lore.kernel.org/git/20201027075853.GH3005508@coredump.intra.peff.net/#t
to improve the !!symlink() codepath in apply.c

Still, it seems like a rather jarring gap in implementation to just warn
about this in fsck for the benefit of e.g. server operations, but then
hard die on the current client.

There seems to be no way around that hard die, and both repos in that
report are ones that are just symlinking .gitignore to a
../somedir/.gitignore deep in their own tree.

So aren't we both making the fsck check too loose and the client too
strict? Would anyone care if this was an error on fsck if we did the "is
outside repo?" check?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning
  2021-02-16  0:38     ` Ævar Arnfjörð Bjarmason
@ 2021-02-16  1:16       ` Jeff King
  2021-02-16  1:56         ` Junio C Hamano
  2021-02-16 12:48         ` Jeff King
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16  1:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

On Tue, Feb 16, 2021 at 01:38:30AM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Feb 16 2021, Jeff King wrote:
> 
> > While there are some minor security implications to having these files
> > be symlinks, this is out-weighed by the inconvenience of blocking
> > historical commits in some projects that might include them.
> 
> Digging up the relevant thread that's the projects noted at
> https://lore.kernel.org/git/20201027033518.GH2645313@google.com/ ?
> 
> I cloned the openmrn.git repository noted there, and checkout dies with:
> 
>     error: invalid path 'applications/clinic_app/targets/linux.x86/.gitignore'
>     fatal: Could not reset index file to revision 'HEAD'.
> 
> I'm running a recent-ish snapshot of next at d98b1dd5eaa7, so with your
> verify_path() change in current "seen".
> 
> So this series changes nothing about the checkout, just the fsck check?

Right.

> I see there's your
> https://lore.kernel.org/git/20201027075853.GH3005508@coredump.intra.peff.net/#t
> to improve the !!symlink() codepath in apply.c

That's a somewhat orthogonal approach, that tries to look at out-of-tree
symlinks (rather than banning all symlinks for these files).

I think it's worth banning them all, though; they don't actually work
as you'd expect (i.e., whenever we read them in-repo the symlinks do
nothing).

> Still, it seems like a rather jarring gap in implementation to just warn
> about this in fsck for the benefit of e.g. server operations, but then
> hard die on the current client.

It's not for the benefits of servers. It's because the solution for them
is to stop symlinking, which fixes clone/checkout of new commits. But
they'll still have those old trees hanging around in their history. If
everybody rejects them, then it becomes difficult to push/fetch at all.

That said, they'd probably want to checkout those old commits, too. So
we probably do need a config override, even if it's a broad one ("trust
me, this repo is OK, just allow symlinks for these special files").

> There seems to be no way around that hard die, and both repos in that
> report are ones that are just symlinking .gitignore to a
> ../somedir/.gitignore deep in their own tree.
> 
> So aren't we both making the fsck check too loose and the client too
> strict? Would anyone care if this was an error on fsck if we did the "is
> outside repo?" check?

An outside-the-repo check would probably be less invasive, but:

  - it still allows broken setups

  - it's significantly more complex. I know that the implementation I
    showed errs on the side of complaining in at least some cases
    (because it doesn't know if intermediate paths are themselves
    symlinks). But I'd worry there are also cases where it covers too
    little, nullifying the protection.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning
  2021-02-16  1:16       ` Jeff King
@ 2021-02-16  1:56         ` Junio C Hamano
  2021-02-16 12:54           ` Jeff King
  2021-02-16 12:48         ` Jeff King
  1 sibling, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2021-02-16  1:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart, git

Jeff King <peff@peff.net> writes:

> That said, they'd probably want to checkout those old commits, too. So
> we probably do need a config override, even if it's a broad one ("trust
> me, this repo is OK, just allow symlinks for these special files").

Is this about the check that is overly strict for some existing
projects that kept the jk/symlinked-dotgitx-files topic in the
'seen' so far?

On the fsck end, we know we can demote the error level per
repository, but I wonder if we should make checkout/clone honor the
same setting?

I think GITMODULES_SYMLINK has been there for quite some time at
"error" level and we do want to discourage it to be a symbolic link,
so I am not quite sure what the demoting of these two achieves.  Why
aren't we having a similar issue on .gitmodules that is a symbolic
link?

Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning
  2021-02-16  1:16       ` Jeff King
  2021-02-16  1:56         ` Junio C Hamano
@ 2021-02-16 12:48         ` Jeff King
  2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
  1 sibling, 1 reply; 25+ messages in thread
From: Jeff King @ 2021-02-16 12:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

On Mon, Feb 15, 2021 at 08:16:00PM -0500, Jeff King wrote:

> That said, they'd probably want to checkout those old commits, too. So
> we probably do need a config override, even if it's a broad one ("trust
> me, this repo is OK, just allow symlinks for these special files").

Another option here is setting core.symlinks to false. That works more
broadly than just the one symlink, though.

It might be possible to apply that same setting (perhaps automatically,
even) to just these .gitattributes, etc, metafiles. That may get tricky,
though, as we'd need to do it not just on checkout, but any time we're
considering the file (because we wouldn't want "git add" to re-add it as
a non-symlink, nor git-diff to report it, etc).

> > So aren't we both making the fsck check too loose and the client too
> > strict? Would anyone care if this was an error on fsck if we did the "is
> > outside repo?" check?
> 
> An outside-the-repo check would probably be less invasive, but:
> 
>   - it still allows broken setups
> 
>   - it's significantly more complex. I know that the implementation I
>     showed errs on the side of complaining in at least some cases
>     (because it doesn't know if intermediate paths are themselves
>     symlinks). But I'd worry there are also cases where it covers too
>     little, nullifying the protection.

Adding to the "complexity" point: it's also impossible to implement via
fsck, where we do not have the full path of the tree entry. We could
live without the fsck support if need be, though.

I am beginning to wonder if just opening them all with O_NOFOLLOW (and a
hacky 2-syscall fallback for portability) might be less ugly than all of
this.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning
  2021-02-16  1:56         ` Junio C Hamano
@ 2021-02-16 12:54           ` Jeff King
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16 12:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart, git

On Mon, Feb 15, 2021 at 05:56:50PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > That said, they'd probably want to checkout those old commits, too. So
> > we probably do need a config override, even if it's a broad one ("trust
> > me, this repo is OK, just allow symlinks for these special files").
> 
> Is this about the check that is overly strict for some existing
> projects that kept the jk/symlinked-dotgitx-files topic in the
> 'seen' so far?

Yes.

> On the fsck end, we know we can demote the error level per
> repository, but I wonder if we should make checkout/clone honor the
> same setting?

What would the default be? If it's permissive, then it feels like we are
not really solving much, as anybody who wanted to be careful can already
inspect the tree contents. This is about avoiding surprises in the
default config.

If it's to forbid by default, then yes, I think the "trust me this repo
is OK" I gave above would be a viable path forward.

> I think GITMODULES_SYMLINK has been there for quite some time at
> "error" level and we do want to discourage it to be a symbolic link,
> so I am not quite sure what the demoting of these two achieves.  Why
> aren't we having a similar issue on .gitmodules that is a symbolic
> link?

I think it's just less common to have symlinked .gitmodules. To be
clear, I think symlinked .gitignore is also pretty uncommon. Back when
we discussed this originally in 2018 I scanned most of GitHub and came
up with only a handful of repositories that did so.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/6] open in-tree files with O_NOFOLLOW
  2021-02-16 12:48         ` Jeff King
@ 2021-02-16 14:43           ` Jeff King
  2021-02-16 14:44             ` [PATCH 1/6] add open_nofollow() helper Jeff King
                               ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

On Tue, Feb 16, 2021 at 07:48:23AM -0500, Jeff King wrote:

> I am beginning to wonder if just opening them all with O_NOFOLLOW (and a
> hacky 2-syscall fallback for portability) might be less ugly than all of
> this.

So here's what that series might look like. It would replace all of this
verify_path() stuff entirely (and fsck, though we might want to add
detection to fsck just as an informational thing). It gives similar
protections, and would similarly force people using an in-tree symlink
to stop doing that. But it makes it much less of a pain to do so,
because they can still check out, etc; the symlinks just won't be
followed.

I think we could even use the same technique to roll back the
restrictions on .gitmodules being a symlink. That one makes me a bit
more nervous, just because we also write it. I _think_ that might be
safe, because we only do so using a temp file and rename(), which should
replace the symlink.

  [1/6]: add open_nofollow() helper
  [2/6]: attr: convert "macro_ok" into a flags field
  [3/6]: exclude: add flags parameter to add_patterns()
  [4/6]: attr: do not respect symlinks for in-tree .gitattributes
  [5/6]: exclude: do not respect symlinks for in-tree .gitignore
  [6/6]: mailmap: do not respect symlinks for in-tree .mailmap

 attr.c                    | 60 +++++++++++++++++++++++++--------------
 builtin/sparse-checkout.c |  8 +++---
 dir.c                     | 21 ++++++++++----
 dir.h                     |  3 +-
 git-compat-util.h         |  7 +++++
 mailmap.c                 | 22 ++++++++++----
 t/t0003-attributes.sh     | 36 +++++++++++++++++++++--
 t/t0008-ignores.sh        | 34 ++++++++++++++++++++++
 t/t4203-mailmap.sh        | 31 ++++++++++++++++++++
 wrapper.c                 | 16 +++++++++++
 10 files changed, 197 insertions(+), 41 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] add open_nofollow() helper
  2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
@ 2021-02-16 14:44             ` Jeff King
  2021-02-16 14:54               ` Jeff King
  2021-02-16 14:44             ` [PATCH 2/6] attr: convert "macro_ok" into a flags field Jeff King
                               ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

Some callers of open() would like to use O_NOFOLLOW, but it is not
available on all platforms. Let's abstract this into a helper function
so we can provide system-specific implementations.

Some light web-searching reveals that we might be able to get something
similar on Windows using FILE_FLAG_OPEN_REPARSE_POINT. I didn't dig into
this further.

For other systems without O_NOFOLLOW or any equivalent, we have two
options for fallback:

  - we can just open anyway, following symlinks; this may have security
    implications (e.g., following untrusted in-tree symlinks)

  - we can determine whether the path is a symlink with lstat().

    This is slower (two syscalls instead of one), but that may be
    acceptable for infrequent uses like looking up .gitattributes files
    (especially because we can get away with a single syscall for the
    common case of ENOENT).

    It's also racy, but should be sufficient for our needs (we are
    worried about in-tree symlinks that we ourselves would have
    previously created). We could make it non-racy at the cost of making
    it even slower, by doing an fstat() on the opened descriptor and
    comparing the dev/ino fields to the original lstat().

This patch implements the lstat() option in its slightly-faster racy
form.

Signed-off-by: Jeff King <peff@peff.net>
---
 git-compat-util.h |  7 +++++++
 wrapper.c         | 16 ++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index 838246289c..fd99eaeb6d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1231,6 +1231,13 @@ int access_or_die(const char *path, int mode, unsigned flag);
 /* Warn on an inaccessible file if errno indicates this is an error */
 int warn_on_fopen_errors(const char *path);
 
+/*
+ * Open with O_NOFOLLOW, or equivalent. Note that the fallback equivalent
+ * may be racy. Do not use this as protection against an attacker who can
+ * simultaneously create paths.
+ */
+int open_nofollow(const char *path, int flags);
+
 #if !defined(USE_PARENS_AROUND_GETTEXT_N) && defined(__GNUC__)
 #define USE_PARENS_AROUND_GETTEXT_N 1
 #endif
diff --git a/wrapper.c b/wrapper.c
index bcda41e374..563ad590df 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -678,3 +678,19 @@ int is_empty_or_missing_file(const char *filename)
 
 	return !st.st_size;
 }
+
+int open_nofollow(const char *path, int flags)
+{
+#ifdef O_NOFOLLOW
+	return open(path, flags | O_NOFOLLOW);
+#else
+	struct stat st;
+	if (lstat(path, &st) < 0)
+		return -1;
+	if (S_ISLNK(st.st_mode)) {
+		errno = ELOOP;
+		return -1;
+	}
+	return open(path, flags);
+#endif
+}
-- 
2.30.1.986.gd86016a168


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/6] attr: convert "macro_ok" into a flags field
  2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
  2021-02-16 14:44             ` [PATCH 1/6] add open_nofollow() helper Jeff King
@ 2021-02-16 14:44             ` Jeff King
  2021-02-16 14:44             ` [PATCH 3/6] exclude: add flags parameter to add_patterns() Jeff King
                               ` (4 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

The attribute code can have a rather deep callstack, through
which we have to pass the "macro_ok" flag. In anticipation
of adding other flags, let's convert this to a generic
bit-field.

Signed-off-by: Jeff King <peff@peff.net>
---
 attr.c | 43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/attr.c b/attr.c
index 4ef85d668b..7b0cab085f 100644
--- a/attr.c
+++ b/attr.c
@@ -278,6 +278,9 @@ struct match_attr {
 
 static const char blank[] = " \t\r\n";
 
+/* Flags usable in read_attr() and parse_attr_line() family of functions. */
+#define READ_ATTR_MACRO_OK (1<<0)
+
 /*
  * Parse a whitespace-delimited attribute state (i.e., "attr",
  * "-attr", "!attr", or "attr=value") from the string starting at src.
@@ -331,7 +334,7 @@ static const char *parse_attr(const char *src, int lineno, const char *cp,
 }
 
 static struct match_attr *parse_attr_line(const char *line, const char *src,
-					  int lineno, int macro_ok)
+					  int lineno, unsigned flags)
 {
 	int namelen;
 	int num_attr, i;
@@ -355,7 +358,7 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
 
 	if (strlen(ATTRIBUTE_MACRO_PREFIX) < namelen &&
 	    starts_with(name, ATTRIBUTE_MACRO_PREFIX)) {
-		if (!macro_ok) {
+		if (!(flags & READ_ATTR_MACRO_OK)) {
 			fprintf_ln(stderr, _("%s not allowed: %s:%d"),
 				   name, src, lineno);
 			goto fail_return;
@@ -653,11 +656,11 @@ static void handle_attr_line(struct attr_stack *res,
 			     const char *line,
 			     const char *src,
 			     int lineno,
-			     int macro_ok)
+			     unsigned flags)
 {
 	struct match_attr *a;
 
-	a = parse_attr_line(line, src, lineno, macro_ok);
+	a = parse_attr_line(line, src, lineno, flags);
 	if (!a)
 		return;
 	ALLOC_GROW(res->attrs, res->num_matches + 1, res->alloc);
@@ -672,7 +675,8 @@ static struct attr_stack *read_attr_from_array(const char **list)
 
 	res = xcalloc(1, sizeof(*res));
 	while ((line = *(list++)) != NULL)
-		handle_attr_line(res, line, "[builtin]", ++lineno, 1);
+		handle_attr_line(res, line, "[builtin]", ++lineno,
+				 READ_ATTR_MACRO_OK);
 	return res;
 }
 
@@ -698,7 +702,7 @@ void git_attr_set_direction(enum git_attr_direction new_direction)
 	direction = new_direction;
 }
 
-static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
+static struct attr_stack *read_attr_from_file(const char *path, unsigned flags)
 {
 	FILE *fp = fopen_or_warn(path, "r");
 	struct attr_stack *res;
@@ -712,15 +716,15 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 		char *bufp = buf;
 		if (!lineno)
 			skip_utf8_bom(&bufp, strlen(bufp));
-		handle_attr_line(res, bufp, path, ++lineno, macro_ok);
+		handle_attr_line(res, bufp, path, ++lineno, flags);
 	}
 	fclose(fp);
 	return res;
 }
 
 static struct attr_stack *read_attr_from_index(const struct index_state *istate,
 					       const char *path,
-					       int macro_ok)
+					       unsigned flags)
 {
 	struct attr_stack *res;
 	char *buf, *sp;
@@ -741,35 +745,35 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate,
 		ep = strchrnul(sp, '\n');
 		more = (*ep == '\n');
 		*ep = '\0';
-		handle_attr_line(res, sp, path, ++lineno, macro_ok);
+		handle_attr_line(res, sp, path, ++lineno, flags);
 		sp = ep + more;
 	}
 	free(buf);
 	return res;
 }
 
 static struct attr_stack *read_attr(const struct index_state *istate,
-				    const char *path, int macro_ok)
+				    const char *path, unsigned flags)
 {
 	struct attr_stack *res = NULL;
 
 	if (direction == GIT_ATTR_INDEX) {
-		res = read_attr_from_index(istate, path, macro_ok);
+		res = read_attr_from_index(istate, path, flags);
 	} else if (!is_bare_repository()) {
 		if (direction == GIT_ATTR_CHECKOUT) {
-			res = read_attr_from_index(istate, path, macro_ok);
+			res = read_attr_from_index(istate, path, flags);
 			if (!res)
-				res = read_attr_from_file(path, macro_ok);
+				res = read_attr_from_file(path, flags);
 		} else if (direction == GIT_ATTR_CHECKIN) {
-			res = read_attr_from_file(path, macro_ok);
+			res = read_attr_from_file(path, flags);
 			if (!res)
 				/*
 				 * There is no checked out .gitattributes file
 				 * there, but we might have it in the index.
 				 * We allow operation in a sparsely checked out
 				 * work tree, so read from it.
 				 */
-				res = read_attr_from_index(istate, path, macro_ok);
+				res = read_attr_from_index(istate, path, flags);
 		}
 	}
 
@@ -844,6 +848,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,
 				 struct attr_stack **stack)
 {
 	struct attr_stack *e;
+	unsigned flags = READ_ATTR_MACRO_OK;
 
 	if (*stack)
 		return;
@@ -854,23 +859,23 @@ static void bootstrap_attr_stack(const struct index_state *istate,
 
 	/* system-wide frame */
 	if (git_attr_system()) {
-		e = read_attr_from_file(git_etc_gitattributes(), 1);
+		e = read_attr_from_file(git_etc_gitattributes(), flags);
 		push_stack(stack, e, NULL, 0);
 	}
 
 	/* home directory */
 	if (get_home_gitattributes()) {
-		e = read_attr_from_file(get_home_gitattributes(), 1);
+		e = read_attr_from_file(get_home_gitattributes(), flags);
 		push_stack(stack, e, NULL, 0);
 	}
 
 	/* root directory */
-	e = read_attr(istate, GITATTRIBUTES_FILE, 1);
+	e = read_attr(istate, GITATTRIBUTES_FILE, flags);
 	push_stack(stack, e, xstrdup(""), 0);
 
 	/* info frame */
 	if (startup_info->have_repository)
-		e = read_attr_from_file(git_path_info_attributes(), 1);
+		e = read_attr_from_file(git_path_info_attributes(), flags);
 	else
 		e = NULL;
 	if (!e)
-- 
2.30.1.986.gd86016a168


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/6] exclude: add flags parameter to add_patterns()
  2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
  2021-02-16 14:44             ` [PATCH 1/6] add open_nofollow() helper Jeff King
  2021-02-16 14:44             ` [PATCH 2/6] attr: convert "macro_ok" into a flags field Jeff King
@ 2021-02-16 14:44             ` Jeff King
  2021-02-16 14:44             ` [PATCH 4/6] attr: do not respect symlinks for in-tree .gitattributes Jeff King
                               ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

There are a number of callers of add_patterns() and its sibling
functions. Let's give them a "flags" parameter for adding new options
without having to touch each caller. We'll use this in a future patch to
add O_NOFOLLOW support. But for now each caller just passes 0.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/sparse-checkout.c |  8 ++++----
 dir.c                     | 13 +++++++------
 dir.h                     |  3 ++-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 2306a9ad98..d7da50ada5 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -64,7 +64,7 @@ static int sparse_checkout_list(int argc, const char **argv)
 	pl.use_cone_patterns = core_sparse_checkout_cone;
 
 	sparse_filename = get_sparse_checkout_filename();
-	res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL);
+	res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL, 0);
 	free(sparse_filename);
 
 	if (res < 0) {
@@ -321,7 +321,7 @@ static int sparse_checkout_init(int argc, const char **argv)
 	memset(&pl, 0, sizeof(pl));
 
 	sparse_filename = get_sparse_checkout_filename();
-	res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL);
+	res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL, 0);
 
 	/* If we already have a sparse-checkout file, use it. */
 	if (res >= 0) {
@@ -483,7 +483,7 @@ static void add_patterns_cone_mode(int argc, const char **argv,
 	existing.use_cone_patterns = core_sparse_checkout_cone;
 
 	if (add_patterns_from_file_to_list(sparse_filename, "", 0,
-					   &existing, NULL))
+					   &existing, NULL, 0))
 		die(_("unable to load existing sparse-checkout patterns"));
 	free(sparse_filename);
 
@@ -507,7 +507,7 @@ static void add_patterns_literal(int argc, const char **argv,
 {
 	char *sparse_filename = get_sparse_checkout_filename();
 	if (add_patterns_from_file_to_list(sparse_filename, "", 0,
-					   pl, NULL))
+					   pl, NULL, 0))
 		die(_("unable to load existing sparse-checkout patterns"));
 	free(sparse_filename);
 	add_patterns_from_input(pl, argc, argv);
diff --git a/dir.c b/dir.c
index d153a63bbd..f7fb1db718 100644
--- a/dir.c
+++ b/dir.c
@@ -1046,7 +1046,7 @@ static int add_patterns_from_buffer(char *buf, size_t size,
  */
 static int add_patterns(const char *fname, const char *base, int baselen,
 			struct pattern_list *pl, struct index_state *istate,
-			struct oid_stat *oid_stat)
+			unsigned flags, struct oid_stat *oid_stat)
 {
 	struct stat st;
 	int r;
@@ -1143,9 +1143,10 @@ static int add_patterns_from_buffer(char *buf, size_t size,
 
 int add_patterns_from_file_to_list(const char *fname, const char *base,
 				   int baselen, struct pattern_list *pl,
-				   struct index_state *istate)
+				   struct index_state *istate,
+				   unsigned flags)
 {
-	return add_patterns(fname, base, baselen, pl, istate, NULL);
+	return add_patterns(fname, base, baselen, pl, istate, flags, NULL);
 }
 
 int add_patterns_from_blob_to_list(
@@ -1194,7 +1195,7 @@ static void add_patterns_from_file_1(struct dir_struct *dir, const char *fname,
 	if (!dir->untracked)
 		dir->unmanaged_exclude_files++;
 	pl = add_pattern_list(dir, EXC_FILE, fname);
-	if (add_patterns(fname, "", 0, pl, NULL, oid_stat) < 0)
+	if (add_patterns(fname, "", 0, pl, NULL, 0, oid_stat) < 0)
 		die(_("cannot use %s as an exclude file"), fname);
 }
 
@@ -1557,7 +1558,7 @@ static void prep_exclude(struct dir_struct *dir,
 			strbuf_addbuf(&sb, &dir->basebuf);
 			strbuf_addstr(&sb, dir->exclude_per_dir);
 			pl->src = strbuf_detach(&sb, NULL);
-			add_patterns(pl->src, pl->src, stk->baselen, pl, istate,
+			add_patterns(pl->src, pl->src, stk->baselen, pl, istate, 0,
 				     untracked ? &oid_stat : NULL);
 		}
 		/*
@@ -3009,7 +3010,7 @@ int get_sparse_checkout_patterns(struct pattern_list *pl)
 	char *sparse_filename = get_sparse_checkout_filename();
 
 	pl->use_cone_patterns = core_sparse_checkout_cone;
-	res = add_patterns_from_file_to_list(sparse_filename, "", 0, pl, NULL);
+	res = add_patterns_from_file_to_list(sparse_filename, "", 0, pl, NULL, 0);
 
 	free(sparse_filename);
 	return res;
diff --git a/dir.h b/dir.h
index facfae4740..04d886cfce 100644
--- a/dir.h
+++ b/dir.h
@@ -420,7 +420,8 @@ int hashmap_contains_parent(struct hashmap *map,
 struct pattern_list *add_pattern_list(struct dir_struct *dir,
 				      int group_type, const char *src);
 int add_patterns_from_file_to_list(const char *fname, const char *base, int baselen,
-				   struct pattern_list *pl, struct  index_state *istate);
+				   struct pattern_list *pl, struct index_state *istate,
+				   unsigned flags);
 void add_patterns_from_file(struct dir_struct *, const char *fname);
 int add_patterns_from_blob_to_list(struct object_id *oid,
 				   const char *base, int baselen,
-- 
2.30.1.986.gd86016a168


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 4/6] attr: do not respect symlinks for in-tree .gitattributes
  2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
                               ` (2 preceding siblings ...)
  2021-02-16 14:44             ` [PATCH 3/6] exclude: add flags parameter to add_patterns() Jeff King
@ 2021-02-16 14:44             ` Jeff King
  2021-02-16 14:44             ` [PATCH 5/6] exclude: do not respect symlinks for in-tree .gitignore Jeff King
                               ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

The attributes system may sometimes read in-tree files from the
filesystem, and sometimes from the index. In the latter case, we do not
resolve symbolic links (and are not likely to ever start doing so).
Let's open filesystem links with O_NOFOLLOW so that the two cases behave
consistently.

As a bonus, this means that git will not follow such symlinks to read
and parse out-of-tree paths. In some cases this could have security
implications, as a malicious repository can cause Git to open and read
arbitrary files. It could already feed arbitrary content to the parser,
but in certain setups it might be able to exfiltrate data from those
paths (e.g., if an automated service operating on the malicious repo
reveals its stderr to an attacker).

Note that O_NOFOLLOW only prevents following links for the path itself,
not intermediate directories in the path.  At first glance, it seems
like

  ln -s /some/path in-repo

might still look at "in-repo/.gitattributes", following the symlink to
"/some/path/.gitattributes". However, if "in-repo" is a symbolic link,
then we know that it has no git paths below it, and will never look at
its .gitattributes file.

We will continue to support out-of-tree symbolic links (e.g., in
$GIT_DIR/info/attributes); this just affects in-tree links. When a
symbolic link is encountered, the contents are ignored and a warning is
printed. POSIX specifies ELOOP in this case, so the user would generally
see something like:

  warning: unable to access '.gitattributes': Too many levels of symbolic links

Signed-off-by: Jeff King <peff@peff.net>
---
 attr.c                | 19 +++++++++++++++----
 t/t0003-attributes.sh | 36 +++++++++++++++++++++++++++++++++---
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/attr.c b/attr.c
index 7b0cab085f..a28177915b 100644
--- a/attr.c
+++ b/attr.c
@@ -280,6 +280,7 @@ static const char blank[] = " \t\r\n";
 
 /* Flags usable in read_attr() and parse_attr_line() family of functions. */
 #define READ_ATTR_MACRO_OK (1<<0)
+#define READ_ATTR_NOFOLLOW (1<<1)
 
 /*
  * Parse a whitespace-delimited attribute state (i.e., "attr",
@@ -704,13 +705,23 @@ void git_attr_set_direction(enum git_attr_direction new_direction)
 
 static struct attr_stack *read_attr_from_file(const char *path, unsigned flags)
 {
-	FILE *fp = fopen_or_warn(path, "r");
+	int fd;
+	FILE *fp;
 	struct attr_stack *res;
 	char buf[2048];
 	int lineno = 0;
 
-	if (!fp)
+	if (flags & READ_ATTR_NOFOLLOW)
+		fd = open_nofollow(path, O_RDONLY);
+	else
+		fd = open(path, O_RDONLY);
+
+	if (fd < 0) {
+		warn_on_fopen_errors(path);
 		return NULL;
+	}
+	fp = xfdopen(fd, "r");
+
 	res = xcalloc(1, sizeof(*res));
 	while (fgets(buf, sizeof(buf), fp)) {
 		char *bufp = buf;
@@ -870,7 +881,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,
 	}
 
 	/* root directory */
-	e = read_attr(istate, GITATTRIBUTES_FILE, flags);
+	e = read_attr(istate, GITATTRIBUTES_FILE, flags | READ_ATTR_NOFOLLOW);
 	push_stack(stack, e, xstrdup(""), 0);
 
 	/* info frame */
@@ -961,7 +972,7 @@ static void prepare_attr_stack(const struct index_state *istate,
 		strbuf_add(&pathbuf, path + pathbuf.len, (len - pathbuf.len));
 		strbuf_addf(&pathbuf, "/%s", GITATTRIBUTES_FILE);
 
-		next = read_attr(istate, pathbuf.buf, 0);
+		next = read_attr(istate, pathbuf.buf, READ_ATTR_NOFOLLOW);
 
 		/* reset the pathbuf to not include "/.gitattributes" */
 		strbuf_setlen(&pathbuf, len);
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index b660593c20..1e4c672b84 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -4,12 +4,16 @@ test_description=gitattributes
 
 . ./test-lib.sh
 
-attr_check () {
+attr_check_basic () {
 	path="$1" expect="$2" git_opts="$3" &&
 
 	git $git_opts check-attr test -- "$path" >actual 2>err &&
 	echo "$path: test: $expect" >expect &&
-	test_cmp expect actual &&
+	test_cmp expect actual
+}
+
+attr_check () {
+	attr_check_basic "$@" &&
 	test_must_be_empty err
 }
 
@@ -331,12 +335,38 @@ test_expect_success 'binary macro expanded by -a' '
 	test_cmp expect actual
 '
 
-
 test_expect_success 'query binary macro directly' '
 	echo "file binary" >.gitattributes &&
 	echo file: binary: set >expect &&
 	git check-attr binary file >actual &&
 	test_cmp expect actual
 '
 
+test_expect_success SYMLINKS 'set up symlink tests' '
+	echo "* test" >attr &&
+	rm -f .gitattributes
+'
+
+test_expect_success SYMLINKS 'symlinks respected in core.attributesFile' '
+	test_when_finished "rm symlink" &&
+	ln -s attr symlink &&
+	test_config core.attributesFile "$(pwd)/symlink" &&
+	attr_check file set
+'
+
+test_expect_success SYMLINKS 'symlinks respected in info/attributes' '
+	test_when_finished "rm .git/info/attributes" &&
+	ln -s ../../attr .git/info/attributes &&
+	attr_check file set
+'
+
+test_expect_success SYMLINKS 'symlinks not respected in-tree' '
+	test_when_finished "rm -rf .gitattributes subdir" &&
+	ln -s attr .gitattributes &&
+	mkdir subdir &&
+	ln -s ../attr subdir/.gitattributes &&
+	attr_check_basic subdir/file unspecified &&
+	test_i18ngrep "unable to access.*gitattributes" err
+'
+
 test_done
-- 
2.30.1.986.gd86016a168


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 5/6] exclude: do not respect symlinks for in-tree .gitignore
  2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
                               ` (3 preceding siblings ...)
  2021-02-16 14:44             ` [PATCH 4/6] attr: do not respect symlinks for in-tree .gitattributes Jeff King
@ 2021-02-16 14:44             ` Jeff King
  2021-02-16 14:44             ` [PATCH 6/6] mailmap: do not respect symlinks for in-tree .mailmap Jeff King
  2021-02-25 19:25             ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Junio C Hamano
  6 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

As with .gitattributes, we would like to make sure that .gitignore files
are handled consistently whether read from the index or from the
filesystem. Likewise, we would like to avoid reading out-of-tree files
pointed to by the symlinks, which could have security implications in
certain setups.

We can cover both by using open_nofollow() when opening the in-tree
files. We'll continue to follow links for core.excludesFile, as well as
$GIT_DIR/info/exclude.

Signed-off-by: Jeff King <peff@peff.net>
---
 dir.c              | 12 ++++++++++--
 t/t0008-ignores.sh | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index f7fb1db718..3692a28186 100644
--- a/dir.c
+++ b/dir.c
@@ -1035,6 +1035,9 @@ static int add_patterns_from_buffer(char *buf, size_t size,
 				    const char *base, int baselen,
 				    struct pattern_list *pl);
 
+/* Flags for add_patterns() */
+#define PATTERN_NOFOLLOW (1<<0)
+
 /*
  * Given a file with name "fname", read it (either from disk, or from
  * an index if 'istate' is non-null), parse it and store the
@@ -1054,7 +1057,11 @@ static int add_patterns(const char *fname, const char *base, int baselen,
 	size_t size = 0;
 	char *buf;
 
-	fd = open(fname, O_RDONLY);
+	if (flags & PATTERN_NOFOLLOW)
+		fd = open_nofollow(fname, O_RDONLY);
+	else
+		fd = open(fname, O_RDONLY);
+
 	if (fd < 0 || fstat(fd, &st) < 0) {
 		if (fd < 0)
 			warn_on_fopen_errors(fname);
@@ -1558,7 +1565,8 @@ static void prep_exclude(struct dir_struct *dir,
 			strbuf_addbuf(&sb, &dir->basebuf);
 			strbuf_addstr(&sb, dir->exclude_per_dir);
 			pl->src = strbuf_detach(&sb, NULL);
-			add_patterns(pl->src, pl->src, stk->baselen, pl, istate, 0,
+			add_patterns(pl->src, pl->src, stk->baselen, pl, istate,
+				     PATTERN_NOFOLLOW,
 				     untracked ? &oid_stat : NULL);
 		}
 		/*
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 370a389e5c..854cfda11f 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -865,4 +865,38 @@ test_expect_success 'info/exclude trumps core.excludesfile' '
 	test_cmp expect actual
 '
 
+test_expect_success SYMLINKS 'set up ignore file for symlink tests' '
+	echo "*" >ignore &&
+	rm -f .gitignore .git/info/exclude
+'
+
+test_expect_success SYMLINKS 'symlinks respected in core.excludesFile' '
+	test_when_finished "rm symlink" &&
+	ln -s ignore symlink &&
+	test_config core.excludesFile "$(pwd)/symlink" &&
+	echo file >expect &&
+	git check-ignore file >actual 2>err &&
+	test_cmp expect actual &&
+	test_must_be_empty err
+'
+
+test_expect_success SYMLINKS 'symlinks respected in info/exclude' '
+	test_when_finished "rm .git/info/exclude" &&
+	ln -s ../../ignore .git/info/exclude &&
+	echo file >expect &&
+	git check-ignore file >actual 2>err &&
+	test_cmp expect actual &&
+	test_must_be_empty err
+'
+
+test_expect_success SYMLINKS 'symlinks not respected in-tree' '
+	test_when_finished "rm .gitignore" &&
+	ln -s ignore .gitignore &&
+	mkdir subdir &&
+	ln -s ignore subdir/.gitignore &&
+	test_must_fail git check-ignore subdir/file >actual 2>err &&
+	test_must_be_empty actual &&
+	test_i18ngrep "unable to access.*gitignore" err
+'
+
 test_done
-- 
2.30.1.986.gd86016a168


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 6/6] mailmap: do not respect symlinks for in-tree .mailmap
  2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
                               ` (4 preceding siblings ...)
  2021-02-16 14:44             ` [PATCH 5/6] exclude: do not respect symlinks for in-tree .gitignore Jeff King
@ 2021-02-16 14:44             ` Jeff King
  2021-02-16 14:57               ` Jeff King
  2021-02-25 19:25             ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Junio C Hamano
  6 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

As with .gitattributes and .gitignore, we would like to make sure that
.mailmap files are handled consistently whether read from the a blob (as
is the default behavior in a bare repo) or from the filesystem.
Likewise, we would like to avoid reading out-of-tree files pointed to by
a symlink, which could have security implications in certain setups.

We can cover both by using open_nofollow() when opening the in-tree
files. We'll continue to follow links for mailmap.file, as well as when
reading .mailmap from the current directory when outside of a repository
entirely.

Signed-off-by: Jeff King <peff@peff.net>
---
 mailmap.c          | 22 +++++++++++++++++-----
 t/t4203-mailmap.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/mailmap.c b/mailmap.c
index eb77c6e77c..7ac966107e 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -157,20 +157,30 @@ static void read_mailmap_line(struct string_list *map, char *buffer)
 		add_mapping(map, name1, email1, name2, email2);
 }
 
-static int read_mailmap_file(struct string_list *map, const char *filename)
+/* Flags for read_mailmap_file() */
+#define MAILMAP_NOFOLLOW (1<<0)
+
+static int read_mailmap_file(struct string_list *map, const char *filename,
+			     unsigned flags)
 {
 	char buffer[1024];
 	FILE *f;
+	int fd;
 
 	if (!filename)
 		return 0;
 
-	f = fopen(filename, "r");
-	if (!f) {
+	if (flags & MAILMAP_NOFOLLOW)
+		fd = open_nofollow(filename, O_RDONLY);
+	else
+		fd = open(filename, O_RDONLY);
+
+	if (fd < 0) {
 		if (errno == ENOENT)
 			return 0;
 		return error_errno("unable to open mailmap at %s", filename);
 	}
+	f = xfdopen(fd, "r");
 
 	while (fgets(buffer, sizeof(buffer), f) != NULL)
 		read_mailmap_line(map, buffer);
@@ -225,10 +235,12 @@ int read_mailmap(struct string_list *map)
 	if (!git_mailmap_blob && is_bare_repository())
 		git_mailmap_blob = "HEAD:.mailmap";
 
-	err |= read_mailmap_file(map, ".mailmap");
+	err |= read_mailmap_file(map, ".mailmap",
+				 startup_info->have_repository ?
+				 MAILMAP_NOFOLLOW : 0);
 	if (startup_info->have_repository)
 		err |= read_mailmap_blob(map, git_mailmap_blob);
-	err |= read_mailmap_file(map, git_mailmap_file);
+	err |= read_mailmap_file(map, git_mailmap_file, 0);
 	return err;
 }
 
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 621f9962d5..96a4e6132f 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -889,4 +889,35 @@ test_expect_success 'empty syntax: setup' '
 	test_cmp expect actual
 '
 
+test_expect_success SYMLINKS 'set up symlink tests' '
+	git commit --allow-empty -m foo --author="Orig <orig@example.com>" &&
+	echo "New <new@example.com> <orig@example.com>" >map &&
+	rm -f .mailmap
+'
+
+test_expect_success SYMLINKS 'symlinks respected in mailmap.file' '
+	test_when_finished "rm symlink" &&
+	ln -s map symlink &&
+	git -c mailmap.file="$(pwd)/symlink" log -1 --format=%aE >actual &&
+	echo "new@example.com" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success SYMLINKS 'symlinks respected in non-repo shortlog' '
+	git log -1 >input &&
+	test_when_finished "nongit rm .mailmap" &&
+	nongit ln -sf "$TRASH_DIRECTORY/map" .mailmap &&
+	nongit git shortlog -s <input >actual &&
+	echo "     1	New" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success SYMLINKS 'symlinks not respected in-tree' '
+	test_when_finished "rm .mailmap" &&
+	ln -s map .mailmap &&
+	git log -1 --format=%aE >actual &&
+	echo "orig@example.com" >expect&&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.30.1.986.gd86016a168

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] add open_nofollow() helper
  2021-02-16 14:44             ` [PATCH 1/6] add open_nofollow() helper Jeff King
@ 2021-02-16 14:54               ` Jeff King
  2021-02-16 15:44                 ` Taylor Blau
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

On Tue, Feb 16, 2021 at 09:44:23AM -0500, Jeff King wrote:

>   - we can determine whether the path is a symlink with lstat().
> 
>     This is slower (two syscalls instead of one), but that may be
>     acceptable for infrequent uses like looking up .gitattributes files
>     (especially because we can get away with a single syscall for the
>     common case of ENOENT).
> 
>     It's also racy, but should be sufficient for our needs (we are
>     worried about in-tree symlinks that we ourselves would have
>     previously created). We could make it non-racy at the cost of making
>     it even slower, by doing an fstat() on the opened descriptor and
>     comparing the dev/ino fields to the original lstat().
> 
> This patch implements the lstat() option in its slightly-faster racy
> form.

I manually compared the performance of the O_NOFOLLOW and fallback paths
by running:

  git ls-files | git check-attr --stdin -a

on linux.git, which would try to look at quite a few in-tree attribute
files. The slowdown didn't seem measurable (in fact, the fallback seems
about 1% faster even over many trials, but I think it's just noise).
Both take ~50ms.

Which makes sense, because there's only one .gitattributes file, so most
of the lookups are hitting ENOENT on the lstat() and not even calling
open(). If I do:

  find * -type d | sed 's,$,/.gitattributes,' | xargs touch

then the O_NOFOLLOW case takes ~54ms (which makes sense; we're opening a
bunch of extra empty attribute files), and the fallback case goes to
~56ms. So the difference becomes measurable, but I wonder if it's worth
even caring about. That's 2ms on a pathological case. I'd even be
tempted to implement the non-racy version with the extra fstat(). I
don't think we need it, but just as a least-surprise thing.

This is all on Linux, of course. Perhaps other systems with slower
syscalls may be more impacted.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] mailmap: do not respect symlinks for in-tree .mailmap
  2021-02-16 14:44             ` [PATCH 6/6] mailmap: do not respect symlinks for in-tree .mailmap Jeff King
@ 2021-02-16 14:57               ` Jeff King
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16 14:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Blake Burkhart, Junio C Hamano, git

On Tue, Feb 16, 2021 at 09:44:37AM -0500, Jeff King wrote:

> @@ -225,10 +235,12 @@ int read_mailmap(struct string_list *map)
>  	if (!git_mailmap_blob && is_bare_repository())
>  		git_mailmap_blob = "HEAD:.mailmap";
>  
> -	err |= read_mailmap_file(map, ".mailmap");
> +	err |= read_mailmap_file(map, ".mailmap",
> +				 startup_info->have_repository ?
> +				 MAILMAP_NOFOLLOW : 0);

This conflicts with my jk/mailmap-only-at-root topic currently in
"next", of course. Ditto for the new tests.

The resolution is pretty straight-forward, though:

diff --cc mailmap.c
index 9bb9cf8b30,7ac966107e..0000000000
--- a/mailmap.c
+++ b/mailmap.c
@@@ -225,11 -235,12 +235,13 @@@ int read_mailmap(struct string_list *ma
  	if (!git_mailmap_blob && is_bare_repository())
  		git_mailmap_blob = "HEAD:.mailmap";
  
 -	err |= read_mailmap_file(map, ".mailmap",
 -				 startup_info->have_repository ?
 -				 MAILMAP_NOFOLLOW : 0);
 +	if (!startup_info->have_repository || !is_bare_repository())
- 		err |= read_mailmap_file(map, ".mailmap");
++		err |= read_mailmap_file(map, ".mailmap",
++					 startup_info->have_repository ?
++					 MAILMAP_NOFOLLOW : 0);
  	if (startup_info->have_repository)
  		err |= read_mailmap_blob(map, git_mailmap_blob);
- 	err |= read_mailmap_file(map, git_mailmap_file);
+ 	err |= read_mailmap_file(map, git_mailmap_file, 0);
  	return err;
  }
  
diff --cc t/t4203-mailmap.sh
index 93caf9a46d,96a4e6132f..0000000000
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@@ -889,47 -889,35 +889,78 @@@ test_expect_success 'empty syntax: setu
  	test_cmp expect actual
  '
  
 +test_expect_success 'set up mailmap location tests' '
 +	git init --bare loc-bare &&
 +	git --git-dir=loc-bare --work-tree=. commit \
 +		--allow-empty -m foo --author="Orig <orig@example.com>" &&
 +	echo "New <new@example.com> <orig@example.com>" >loc-bare/.mailmap
 +'
 +
 +test_expect_success 'bare repo with --work-tree finds mailmap at top-level' '
 +	git -C loc-bare --work-tree=. log -1 --format=%aE >actual &&
 +	echo new@example.com >expect &&
 +	test_cmp expect actual
 +'
 +
 +test_expect_success 'bare repo does not look in current directory' '
 +	git -C loc-bare log -1 --format=%aE >actual &&
 +	echo orig@example.com >expect &&
 +	test_cmp expect actual
 +'
 +
 +test_expect_success 'non-git shortlog respects mailmap in current dir' '
 +	git --git-dir=loc-bare log -1 >input &&
 +	nongit cp "$TRASH_DIRECTORY/loc-bare/.mailmap" . &&
 +	nongit git shortlog -s <input >actual &&
 +	echo "     1	New" >expect &&
 +	test_cmp expect actual
 +'
 +
 +test_expect_success 'shortlog on stdin respects mailmap from repo' '
 +	cp loc-bare/.mailmap . &&
 +	git shortlog -s <input >actual &&
 +	echo "     1	New" >expect &&
 +	test_cmp expect actual
 +'
 +
 +test_expect_success 'find top-level mailmap from subdir' '
 +	git clone loc-bare loc-wt &&
 +	cp loc-bare/.mailmap loc-wt &&
 +	mkdir loc-wt/subdir &&
 +	git -C loc-wt/subdir log -1 --format=%aE >actual &&
 +	echo new@example.com >expect &&
 +	test_cmp expect actual
 +'
 +
+ test_expect_success SYMLINKS 'set up symlink tests' '
+ 	git commit --allow-empty -m foo --author="Orig <orig@example.com>" &&
+ 	echo "New <new@example.com> <orig@example.com>" >map &&
+ 	rm -f .mailmap
+ '
+ 
+ test_expect_success SYMLINKS 'symlinks respected in mailmap.file' '
+ 	test_when_finished "rm symlink" &&
+ 	ln -s map symlink &&
+ 	git -c mailmap.file="$(pwd)/symlink" log -1 --format=%aE >actual &&
+ 	echo "new@example.com" >expect &&
+ 	test_cmp expect actual
+ '
+ 
+ test_expect_success SYMLINKS 'symlinks respected in non-repo shortlog' '
+ 	git log -1 >input &&
+ 	test_when_finished "nongit rm .mailmap" &&
+ 	nongit ln -sf "$TRASH_DIRECTORY/map" .mailmap &&
+ 	nongit git shortlog -s <input >actual &&
+ 	echo "     1	New" >expect &&
+ 	test_cmp expect actual
+ '
+ 
+ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
+ 	test_when_finished "rm .mailmap" &&
+ 	ln -s map .mailmap &&
+ 	git log -1 --format=%aE >actual &&
+ 	echo "orig@example.com" >expect&&
+ 	test_cmp expect actual
+ '
+ 
  test_done

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] add open_nofollow() helper
  2021-02-16 14:54               ` Jeff King
@ 2021-02-16 15:44                 ` Taylor Blau
  2021-02-16 16:02                   ` Jeff King
  0 siblings, 1 reply; 25+ messages in thread
From: Taylor Blau @ 2021-02-16 15:44 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart,
	Junio C Hamano, git

On Tue, Feb 16, 2021 at 09:54:24AM -0500, Jeff King wrote:
> This is all on Linux, of course. Perhaps other systems with slower
> syscalls may be more impacted.

I timed it on macOS, which (as you know) I don't use for daily
development, but it's a useful testbed from time to time.

On your branch, 'git check-attr -a' took 193.4ms with O_NOFOLLOW, and
245.3ms without. After touching every .gitattributes file, those numbers
shot up to 340.9ms and 346.6ms, respectively.

(All numbers on linux.git, of course).

There isn't an apples-to-apples comparison between my numbers and yours
(since my laptop is much slower than yours), but the relative numbers
are quite clear that only doing a single syscall is worth it in the
non-pathological case.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] add open_nofollow() helper
  2021-02-16 15:44                 ` Taylor Blau
@ 2021-02-16 16:02                   ` Jeff King
  2021-02-16 16:07                     ` Taylor Blau
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2021-02-16 16:02 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart,
	Junio C Hamano, git

On Tue, Feb 16, 2021 at 10:44:00AM -0500, Taylor Blau wrote:

> On Tue, Feb 16, 2021 at 09:54:24AM -0500, Jeff King wrote:
> > This is all on Linux, of course. Perhaps other systems with slower
> > syscalls may be more impacted.
> 
> I timed it on macOS, which (as you know) I don't use for daily
> development, but it's a useful testbed from time to time.
> 
> On your branch, 'git check-attr -a' took 193.4ms with O_NOFOLLOW, and
> 245.3ms without. After touching every .gitattributes file, those numbers
> shot up to 340.9ms and 346.6ms, respectively.
> 
> (All numbers on linux.git, of course).

That seems...really weird. Your "after touching every file" case is
slightly slower, which is totally expected. But why would the _before_
case be so different? It should have made exactly one extra lstat()
call, and replaced a bunch of open(NOFOLLOW) calls with lstat() (and if
those were so expensive, we'd have presumably seen it in the "touching
every file" case which is running _both_ syscalls).

Can you double-check your initial timings?

> There isn't an apples-to-apples comparison between my numbers and yours
> (since my laptop is much slower than yours), but the relative numbers
> are quite clear that only doing a single syscall is worth it in the
> non-pathological case.

I guess in a sense it doesn't matter that much how macos performs. If it
has O_NOFOLLOW, then it's a no-brainer to use it. The bigger question
is: how many platforms don't have it, and what are _their_ syscalls like
(and are they niche enough that we can accept a small slowdown)?

The obvious one we'd care the most about is Windows, but I still hold
out hope that there's some equivalent mechanism there we can use.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] add open_nofollow() helper
  2021-02-16 16:02                   ` Jeff King
@ 2021-02-16 16:07                     ` Taylor Blau
  2021-02-16 16:11                       ` Taylor Blau
  0 siblings, 1 reply; 25+ messages in thread
From: Taylor Blau @ 2021-02-16 16:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart,
	Junio C Hamano, git

On Tue, Feb 16, 2021 at 11:02:23AM -0500, Jeff King wrote:
> Can you double-check your initial timings?

Aha, I forgot to update the input to the second check-attr tests after
putting .gitattributes files everywhere.

Rerunning with O_NOFOLLOW, the initial timings look something like
128.8ms and 183.7ms before/after.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] add open_nofollow() helper
  2021-02-16 16:07                     ` Taylor Blau
@ 2021-02-16 16:11                       ` Taylor Blau
  2021-02-16 16:19                         ` Jeff King
  0 siblings, 1 reply; 25+ messages in thread
From: Taylor Blau @ 2021-02-16 16:11 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart,
	Junio C Hamano, git

On Tue, Feb 16, 2021 at 11:07:39AM -0500, Taylor Blau wrote:
> On Tue, Feb 16, 2021 at 11:02:23AM -0500, Jeff King wrote:
> > Can you double-check your initial timings?
>
> Aha, I forgot to update the input to the second check-attr tests after
> putting .gitattributes files everywhere.
>
> Rerunning with O_NOFOLLOW, the initial timings look something like
> 128.8ms and 183.7ms before/after.

I should add that "before" refers to a clean checkout, and "after"
refers to the state after running 'find * -type d | ... | xargs touch'.
Both of those numbers are with the O_NOFOLLOW branch.

If I repeat the test after applying:

diff --git a/wrapper.c b/wrapper.c
index 563ad590df..90f603e749 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -681,7 +681,7 @@ int is_empty_or_missing_file(const char *filename)

 int open_nofollow(const char *path, int flags)
 {
-#ifdef O_NOFOLLOW
+#if 0
        return open(path, flags | O_NOFOLLOW);
 #else
        struct stat st;

Then those numbers go from 155.9ms to 197.2ms.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] add open_nofollow() helper
  2021-02-16 16:11                       ` Taylor Blau
@ 2021-02-16 16:19                         ` Jeff King
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-16 16:19 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart,
	Junio C Hamano, git

On Tue, Feb 16, 2021 at 11:11:55AM -0500, Taylor Blau wrote:

> On Tue, Feb 16, 2021 at 11:07:39AM -0500, Taylor Blau wrote:
> > On Tue, Feb 16, 2021 at 11:02:23AM -0500, Jeff King wrote:
> > > Can you double-check your initial timings?
> >
> > Aha, I forgot to update the input to the second check-attr tests after
> > putting .gitattributes files everywhere.
> >
> > Rerunning with O_NOFOLLOW, the initial timings look something like
> > 128.8ms and 183.7ms before/after.
> 
> I should add that "before" refers to a clean checkout, and "after"
> refers to the state after running 'find * -type d | ... | xargs touch'.
> Both of those numbers are with the O_NOFOLLOW branch.
> 
> If I repeat the test after applying:
> 
> diff --git a/wrapper.c b/wrapper.c
> index 563ad590df..90f603e749 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -681,7 +681,7 @@ int is_empty_or_missing_file(const char *filename)
> 
>  int open_nofollow(const char *path, int flags)
>  {
> -#ifdef O_NOFOLLOW
> +#if 0
>         return open(path, flags | O_NOFOLLOW);
>  #else
>         struct stat st;
> 
> Then those numbers go from 155.9ms to 197.2ms.

OK, thanks. So to lay out all cases:

             one file | many files
            +--------------------
O_NOFOLLOW  | 128.8ms | 183.7ms
lstat+open  | 155.9ms | 197.2ms

The jump for both from "one" to "many" files for both rows is expected;
there's more work to do. The jump from 183 to 197 in the "many" column
is what I was wanting to measure (how expensive is the fallback code),
and is around the order of magnitude I'd expect (and IMHO probably
acceptable).

But the jump in the "one file" column between the two code paths is
really confusing. Those are making the same number of syscalls. It's
possible that lstat() is way more expensive than open(), but in that
case I'd expect to see a similarly large jump in the "many" column.
Weird.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/6] open in-tree files with O_NOFOLLOW
  2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
                               ` (5 preceding siblings ...)
  2021-02-16 14:44             ` [PATCH 6/6] mailmap: do not respect symlinks for in-tree .mailmap Jeff King
@ 2021-02-25 19:25             ` Junio C Hamano
  2021-02-26  6:35               ` Jeff King
  6 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2021-02-25 19:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart, git

Jeff King <peff@peff.net> writes:

> On Tue, Feb 16, 2021 at 07:48:23AM -0500, Jeff King wrote:
>
>> I am beginning to wonder if just opening them all with O_NOFOLLOW (and a
>> hacky 2-syscall fallback for portability) might be less ugly than all of
>> this.
>
> So here's what that series might look like. It would replace all of this
> verify_path() stuff entirely (and fsck, though we might want to add
> detection to fsck just as an informational thing). It gives similar
> protections, and would similarly force people using an in-tree symlink
> to stop doing that. But it makes it much less of a pain to do so,
> because they can still check out, etc; the symlinks just won't be
> followed.
>
> I think we could even use the same technique to roll back the
> restrictions on .gitmodules being a symlink. That one makes me a bit
> more nervous, just because we also write it. I _think_ that might be
> safe, because we only do so using a temp file and rename(), which should
> replace the symlink.
>
>   [1/6]: add open_nofollow() helper
>   [2/6]: attr: convert "macro_ok" into a flags field
>   [3/6]: exclude: add flags parameter to add_patterns()
>   [4/6]: attr: do not respect symlinks for in-tree .gitattributes
>   [5/6]: exclude: do not respect symlinks for in-tree .gitignore
>   [6/6]: mailmap: do not respect symlinks for in-tree .mailmap
>
>  attr.c                    | 60 +++++++++++++++++++++++++--------------
>  builtin/sparse-checkout.c |  8 +++---
>  dir.c                     | 21 ++++++++++----
>  dir.h                     |  3 +-
>  git-compat-util.h         |  7 +++++
>  mailmap.c                 | 22 ++++++++++----
>  t/t0003-attributes.sh     | 36 +++++++++++++++++++++--
>  t/t0008-ignores.sh        | 34 ++++++++++++++++++++++
>  t/t4203-mailmap.sh        | 31 ++++++++++++++++++++
>  wrapper.c                 | 16 +++++++++++
>  10 files changed, 197 insertions(+), 41 deletions(-)

So, I've read these changes and they all looked quite reasonable.

Where do we want to go from here?

Merge it down and forget about the changes in verify_path() and fsck
in the jk/symlinked-dotgitx-files topic?  Do we want to also cover
the .gitmodules file with the same mechansim?

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/6] open in-tree files with O_NOFOLLOW
  2021-02-25 19:25             ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Junio C Hamano
@ 2021-02-26  6:35               ` Jeff King
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2021-02-26  6:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Blake Burkhart, git

On Thu, Feb 25, 2021 at 11:25:19AM -0800, Junio C Hamano wrote:

> >   [1/6]: add open_nofollow() helper
> >   [2/6]: attr: convert "macro_ok" into a flags field
> >   [3/6]: exclude: add flags parameter to add_patterns()
> >   [4/6]: attr: do not respect symlinks for in-tree .gitattributes
> >   [5/6]: exclude: do not respect symlinks for in-tree .gitignore
> >   [6/6]: mailmap: do not respect symlinks for in-tree .mailmap
> [...]
> 
> So, I've read these changes and they all looked quite reasonable.
>
> Where do we want to go from here?
> 
> Merge it down and forget about the changes in verify_path() and fsck
> in the jk/symlinked-dotgitx-files topic?  Do we want to also cover
> the .gitmodules file with the same mechansim?

Thanks, I'm glad somebody looked at it. :)

Having pondered it, this really seems like a less risky approach than
forbidding symlinks for those paths. It is not impacting what is allowed
in Git, so nobody's repo will break. And we do not even have to worry
that our name-matching code is correct, since we are asking the OS to do
the right thing.

The biggest risk to me is that there is some hiccup with Windows: they
don't have any NOFOLLOW equivalent, and the fallback lstat() is somehow
slower than a real open(). But that seems unlikely (I could well believe
that lstat+open is slower for them, but that only matters if you
actually have a huge number of gitattributes files to open, in which
case you're probably spending your time reading and parsing them
anyway).

So I'd be happy to proceed with this and throw out
jk/symlinked-dotgitx-files. We can salvage the fsck checks from there,
leaving them as warnings, but it's not urgent (they are just
informational as "btw, this symlink won't work like you think it will").
So I'd probably do that as a separate series.

We could also cover .gitmodules, but I'm inclined not to. It's work and
risk to convert it to this form, for little gain. Nobody seems to have
been bothered by the symlink restriction. I guess it would take some
is_ntfs_gitmodules() checks out of the verify_path() code, which could
have some small performance benefit. But we'd definitely still need to
identify .gitmodules files in fsck, because we have to check over their
actual contents.

So I'd likewise be content to leave that to another series (or never if
nobody sees an upside to it).

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-02-26  6:41 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-13 17:49 Limited local file inclusion with .mailmap symlinks and git-archive Blake Burkhart
2021-02-15 23:17 ` Jeff King
2021-02-15 23:18   ` [PATCH 1/2] fsck: make symlinked .gitignore and .gitattributes a warning Jeff King
2021-02-16  0:38     ` Ævar Arnfjörð Bjarmason
2021-02-16  1:16       ` Jeff King
2021-02-16  1:56         ` Junio C Hamano
2021-02-16 12:54           ` Jeff King
2021-02-16 12:48         ` Jeff King
2021-02-16 14:43           ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Jeff King
2021-02-16 14:44             ` [PATCH 1/6] add open_nofollow() helper Jeff King
2021-02-16 14:54               ` Jeff King
2021-02-16 15:44                 ` Taylor Blau
2021-02-16 16:02                   ` Jeff King
2021-02-16 16:07                     ` Taylor Blau
2021-02-16 16:11                       ` Taylor Blau
2021-02-16 16:19                         ` Jeff King
2021-02-16 14:44             ` [PATCH 2/6] attr: convert "macro_ok" into a flags field Jeff King
2021-02-16 14:44             ` [PATCH 3/6] exclude: add flags parameter to add_patterns() Jeff King
2021-02-16 14:44             ` [PATCH 4/6] attr: do not respect symlinks for in-tree .gitattributes Jeff King
2021-02-16 14:44             ` [PATCH 5/6] exclude: do not respect symlinks for in-tree .gitignore Jeff King
2021-02-16 14:44             ` [PATCH 6/6] mailmap: do not respect symlinks for in-tree .mailmap Jeff King
2021-02-16 14:57               ` Jeff King
2021-02-25 19:25             ` [PATCH 0/6] open in-tree files with O_NOFOLLOW Junio C Hamano
2021-02-26  6:35               ` Jeff King
2021-02-15 23:19   ` [PATCH 2/2] disallow symlinked .mailmap files Jeff King

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).