git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes
@ 2019-12-26 17:42 Johannes Schindelin via GitGitGadget
  2019-12-26 17:42 ` [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries Johannes Schindelin via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-12-26 17:42 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano

As of Git for Windows, v2.24.1(2). cloning a repository that contained a
file with a backslash in its name some time in the past, the command will 
succeed but at the same time print errors like this:

    error: filename in tree entry contains backslash: '\'

A corresponding git fetch will also show those errors, but fail.

The reason is that v2.24.1 is much more strict about backslashes in tree
entries than earlier versions. The intention was actually to prevent 
checking out such files, though: if there was a mistake in a repository long 
ago that has been fixed long since, there is actually no reason why we
should require the history to be rewritten.

This fixes https://github.com/git-for-windows/git/issues/2435.

The idea of this patch is to stop complaining about tree entries, and focus
instead on the index: whenever a file is added to the index, we do not want
any backslashes in the file name on Windows.

As before, this check is only performed on Windows, and only under 
core.protectNTFS. On other platforms, even if core.protectNTFS is turned on,
the backslash is not a directory separator, therefore the Operating System's
syscalls will (should?) refuse to create files on NTFS with backslashes in
their file name.

I would appreciate reviews with a particular eye on keeping users safe: I am
not 100% certain that all relevant file writes go through the index (I think 
that they all go through the index, but I might have well missed a corner
case).

Johannes Schindelin (1):
  mingw: only test index entries for backslashes, not tree entries

 read-cache.c               | 5 +++++
 t/t7415-submodule-names.sh | 7 ++++---
 tree-walk.c                | 6 ------
 3 files changed, 9 insertions(+), 9 deletions(-)


base-commit: 12029dc57db23baef008e77db1909367599210ee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-682%2Fdscho%2Fmingw-only-error-on-backslash-in-index-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-682/dscho/mingw-only-error-on-backslash-in-index-v1
Pull-Request: https://github.com/git/git/pull/682
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 17:42 [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes Johannes Schindelin via GitGitGadget
@ 2019-12-26 17:42 ` Johannes Schindelin via GitGitGadget
  2019-12-26 18:56   ` Junio C Hamano
  2019-12-26 20:03   ` Jonathan Nieder
  2019-12-26 19:22 ` [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes Junio C Hamano
  2019-12-31 22:53 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
  2 siblings, 2 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-12-26 17:42 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

During a clone of a repository that contained a file with a backslash in
its name in the past, as of v2.24.1(2), Git for Windows prints errors
like this:

	error: filename in tree entry contains backslash: '\'

While the clone still succeeds, a similar error prevents the equivalent
`git fetch` operation, which is inconsistent.

Arguably, this is the wrong layer for that error, anyway: As long as the
user never checks out the files whose names contain backslashes, there
should not be any problem in the first place.

So let's instead prevent such files to be added to the index.

This addresses https://github.com/git-for-windows/git/issues/2435

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 read-cache.c               | 5 +++++
 t/t7415-submodule-names.sh | 7 ++++---
 tree-walk.c                | 6 ------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index ad0b48c84d..737916ebd9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
 	int new_only = option & ADD_CACHE_NEW_ONLY;
 
+#ifdef GIT_WINDOWS_NATIVE
+	if (protect_ntfs && strchr(ce->name, '\\'))
+		return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
+#endif
+
 	if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
 		cache_tree_invalidate_path(istate, ce->name);
 
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index 905a557585..7ae0dc8ff4 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -207,6 +207,9 @@ test_expect_success MINGW 'prevent git~1 squatting on Windows' '
 			git hash-object -w --stdin)" &&
 		rev="$(git rev-parse --verify HEAD)" &&
 		hash="$(echo x | git hash-object -w --stdin)" &&
+		test_must_fail git update-index --add \
+			--cacheinfo 160000,$rev,d\\a 2>err &&
+		test_i18ngrep backslash err &&
 		git -c core.protectNTFS=false update-index --add \
 			--cacheinfo 100644,$modules,.gitmodules \
 			--cacheinfo 160000,$rev,c \
@@ -214,9 +217,7 @@ test_expect_success MINGW 'prevent git~1 squatting on Windows' '
 			--cacheinfo 100644,$hash,d./a/x \
 			--cacheinfo 100644,$hash,d./a/..git &&
 		test_tick &&
-		git -c core.protectNTFS=false commit -m "module" &&
-		test_must_fail git show HEAD: 2>err &&
-		test_i18ngrep backslash err
+		git -c core.protectNTFS=false commit -m "module"
 	) &&
 	test_must_fail git -c core.protectNTFS=false \
 		clone --recurse-submodules squatting squatting-clone 2>err &&
diff --git a/tree-walk.c b/tree-walk.c
index b3d162051f..d5a8e096a6 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 		strbuf_addstr(err, _("empty filename in tree entry"));
 		return -1;
 	}
-#ifdef GIT_WINDOWS_NATIVE
-	if (protect_ntfs && strchr(path, '\\')) {
-		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
-		return -1;
-	}
-#endif
 	len = strlen(path) + 1;
 
 	/* Initialize the descriptor entry */
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 17:42 ` [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries Johannes Schindelin via GitGitGadget
@ 2019-12-26 18:56   ` Junio C Hamano
  2019-12-26 21:16     ` Johannes Schindelin
  2019-12-26 20:03   ` Jonathan Nieder
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2019-12-26 18:56 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> During a clone of a repository that contained a file with a backslash in
> its name in the past, as of v2.24.1(2), Git for Windows prints errors
> like this:
>
> 	error: filename in tree entry contains backslash: '\'
>
> While the clone still succeeds, a similar error prevents the equivalent
> `git fetch` operation, which is inconsistent.

Yes, inconsistent is bad and it is puzzling.  I would have expected,
if this gate on the transport layer is desirable, such a check would
be implemented as a part of transfer.fsckObjects and covered equally
by fetch and clone codepaths.  What went wrong to allow "clone" to
go through while stopping "fetch"?  Can you describe the root cause
of the difference in the log message?

> Arguably, this is the wrong layer for that error, anyway: As long as the
> user never checks out the files whose names contain backslashes, there
> should not be any problem in the first place.

I do agree that rejecting these tree objects that has a slash in its
path component is probably wrong.  A GIT_WINDOWS_NATIVE box should
be able to host a bare repository on it, and users on machines that
are OK with paths that Windows may not like should be able to
interact with it, by pushing to it, fetching from it, and updating
the repository on that Windows box by going there and fetching from
elsewhere.  Rejecting these names at the object validity level means
Git on Windows would be incompatible with Git elsewhere.

And It hink the same logic apply to those names like prn, con, nul,
etc.  How are the users protected from them?  We should prevent
these names from entering the index the same way, shouldn't we?

> So let's instead prevent such files to be added to the index.

... and loosen the check that (incorrectly) gets triggered from what
codepaths in "git fetch" (but not from "git clone")?

> This addresses https://github.com/git-for-windows/git/issues/2435
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  read-cache.c               | 5 +++++
>  t/t7415-submodule-names.sh | 7 ++++---
>  tree-walk.c                | 6 ------
>  3 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index ad0b48c84d..737916ebd9 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>  	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
>  	int new_only = option & ADD_CACHE_NEW_ONLY;
>  
> +#ifdef GIT_WINDOWS_NATIVE
> +	if (protect_ntfs && strchr(ce->name, '\\'))

As I wondered above, names that must not enter the index may not be
limited to those with backslashes in them.  Perhaps you'd want a
separate helper function so that you can extend the logic more
easily, i.e.

	if (protect_ntfs && invalid_name_on_windows(ce->name))

or something like that.

> diff --git a/tree-walk.c b/tree-walk.c
> index b3d162051f..d5a8e096a6 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
>  		strbuf_addstr(err, _("empty filename in tree entry"));
>  		return -1;
>  	}
> -#ifdef GIT_WINDOWS_NATIVE
> -	if (protect_ntfs && strchr(path, '\\')) {
> -		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> -		return -1;
> -	}
> -#endif
>  	len = strlen(path) + 1;
>  
>  	/* Initialize the descriptor entry */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes
  2019-12-26 17:42 [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes Johannes Schindelin via GitGitGadget
  2019-12-26 17:42 ` [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries Johannes Schindelin via GitGitGadget
@ 2019-12-26 19:22 ` Junio C Hamano
  2019-12-26 21:19   ` Johannes Schindelin
  2019-12-31 22:53 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
  2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2019-12-26 19:22 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> I would appreciate reviews with a particular eye on keeping users safe: I am
> not 100% certain that all relevant file writes go through the index (I think 
> that they all go through the index, but I might have well missed a corner
> case).

There are peripheral commands that do not use the index at all, such
as "archive"; piping "git archive" output to unarchiver that writes
into the filesystem would be a way.  But I do not think that
qualifies as an attack vector you are looking for.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 17:42 ` [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries Johannes Schindelin via GitGitGadget
  2019-12-26 18:56   ` Junio C Hamano
@ 2019-12-26 20:03   ` Jonathan Nieder
  2019-12-26 21:23     ` Johannes Schindelin
  1 sibling, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2019-12-26 20:03 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Johannes Schindelin, Junio C Hamano

Hi,

Johannes Schindelin wrote:

> During a clone of a repository that contained a file with a backslash in
> its name in the past, as of v2.24.1(2), Git for Windows prints errors
> like this:
>
> 	error: filename in tree entry contains backslash: '\'
>
> While the clone still succeeds, a similar error prevents the equivalent
> `git fetch` operation, which is inconsistent.
>
> Arguably, this is the wrong layer for that error, anyway: As long as the
> user never checks out the files whose names contain backslashes, there
> should not be any problem in the first place.

Hm.  The choice of right layer depends on what repositories in the wild
contain.  If there are none containing filenames with '\', then fsck et
al would be an appropriate layer for this.  With hindsight, it was not
a good idea to support this kind of filename.

However, between the lines of this commit messages I sense that there
*are* repositories in the wild using these kinds of filenames.

Can you say more about that?  What repositories are affected?  Do they
contain such filenames at HEAD or only in their history?  If someone
wants to check out a revision with such filenames, what should happen?

> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>  	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
>  	int new_only = option & ADD_CACHE_NEW_ONLY;
>  
> +#ifdef GIT_WINDOWS_NATIVE
> +	if (protect_ntfs && strchr(ce->name, '\\'))
> +		return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
> +#endif
> +

Why is this specific to the GIT_WINDOWS_NATIVE case?  Wouldn't it affect
ntfs usage on other platforms as well?

[...]
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
>  		strbuf_addstr(err, _("empty filename in tree entry"));
>  		return -1;
>  	}
> -#ifdef GIT_WINDOWS_NATIVE
> -	if (protect_ntfs && strchr(path, '\\')) {
> -		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> -		return -1;
> -	}
> -#endif

Ah, it's inherited from there, so orthogonal to this patch.

To summarize: I think the commit message and docs could use some work
to describe what invariants we're trying to maintain and what
real-world usage motivates them.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 18:56   ` Junio C Hamano
@ 2019-12-26 21:16     ` Johannes Schindelin
  2019-12-30 21:57       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2019-12-26 21:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git

Hi Junio,

On Thu, 26 Dec 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > During a clone of a repository that contained a file with a backslash in
> > its name in the past, as of v2.24.1(2), Git for Windows prints errors
> > like this:
> >
> > 	error: filename in tree entry contains backslash: '\'
> >
> > While the clone still succeeds, a similar error prevents the equivalent
> > `git fetch` operation, which is inconsistent.
>
> Yes, inconsistent is bad and it is puzzling.  I would have expected,
> if this gate on the transport layer is desirable, such a check would
> be implemented as a part of transfer.fsckObjects and covered equally
> by fetch and clone codepaths.  What went wrong to allow "clone" to
> go through while stopping "fetch"?  Can you describe the root cause
> of the difference in the log message?

My bad, I should have root-caused this better.

Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
but not in current `master` of Git, so I simply struck that part from the
commit message.

> > Arguably, this is the wrong layer for that error, anyway: As long as
> > the user never checks out the files whose names contain backslashes,
> > there should not be any problem in the first place.
>
> I do agree that rejecting these tree objects that has a slash in its
> path component is probably wrong.  A GIT_WINDOWS_NATIVE box should
> be able to host a bare repository on it, and users on machines that
> are OK with paths that Windows may not like should be able to
> interact with it, by pushing to it, fetching from it, and updating
> the repository on that Windows box by going there and fetching from
> elsewhere.  Rejecting these names at the object validity level means
> Git on Windows would be incompatible with Git elsewhere.
>
> And It hink the same logic apply to those names like prn, con, nul,
> etc.  How are the users protected from them?  We should prevent
> these names from entering the index the same way, shouldn't we?
>
> > So let's instead prevent such files to be added to the index.
>
> ... and loosen the check that (incorrectly) gets triggered from what
> codepaths in "git fetch" (but not from "git clone")?

I rephrased it to:

    So let's loosen the requirements: we now leave tree entries with
    backslashes in their file names alone, but we do require any entries
    that are added to the Git index to contain no backslashes on Windows.

> > This addresses https://github.com/git-for-windows/git/issues/2435
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  read-cache.c               | 5 +++++
> >  t/t7415-submodule-names.sh | 7 ++++---
> >  tree-walk.c                | 6 ------
> >  3 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/read-cache.c b/read-cache.c
> > index ad0b48c84d..737916ebd9 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> >  	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> >  	int new_only = option & ADD_CACHE_NEW_ONLY;
> >
> > +#ifdef GIT_WINDOWS_NATIVE
> > +	if (protect_ntfs && strchr(ce->name, '\\'))
>
> As I wondered above, names that must not enter the index may not be
> limited to those with backslashes in them.  Perhaps you'd want a
> separate helper function so that you can extend the logic more
> easily, i.e.
>
> 	if (protect_ntfs && invalid_name_on_windows(ce->name))
>
> or something like that.

I decided to perform those checks at yet another layer: when trying to
create new files. My idea was that I would want to catch even things like
`git config -f LPT1 ...` (`LPT1` is a reserved name on Windows, you cannot
create a file with that name).

Obviously, I cannot handle the backslash in the same code path, as e.g.
`git config -f C:\Users\me\.gitconfig ...` is totally valid.

Ciao,
Dscho

> > diff --git a/tree-walk.c b/tree-walk.c
> > index b3d162051f..d5a8e096a6 100644
> > --- a/tree-walk.c
> > +++ b/tree-walk.c
> > @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
> >  		strbuf_addstr(err, _("empty filename in tree entry"));
> >  		return -1;
> >  	}
> > -#ifdef GIT_WINDOWS_NATIVE
> > -	if (protect_ntfs && strchr(path, '\\')) {
> > -		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> > -		return -1;
> > -	}
> > -#endif
> >  	len = strlen(path) + 1;
> >
> >  	/* Initialize the descriptor entry */
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes
  2019-12-26 19:22 ` [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes Junio C Hamano
@ 2019-12-26 21:19   ` Johannes Schindelin
  0 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin @ 2019-12-26 21:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git

Hi Junio,

On Thu, 26 Dec 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > I would appreciate reviews with a particular eye on keeping users safe: I am
> > not 100% certain that all relevant file writes go through the index (I think
> > that they all go through the index, but I might have well missed a corner
> > case).
>
> There are peripheral commands that do not use the index at all, such
> as "archive"; piping "git archive" output to unarchiver that writes
> into the filesystem would be a way.  But I do not think that
> qualifies as an attack vector you are looking for.

Yes, I thought about `git archive`, too. The thing is, I could imagine
legitimate use cases where a user wants to generate e.g. a `.zip` on
Windows, with the intention to unpack it on Linux. In such a case, we
would not want to prevent said `.zip` from being generated.

I also briefly considered the scripts that write some trees into a
temporary location, but those scripts typically use `git read-tree -u -m`
with a temporary index, i.e. they _do_ go through the index, still.

Thank you for your thoughts about this issue,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 20:03   ` Jonathan Nieder
@ 2019-12-26 21:23     ` Johannes Schindelin
  2019-12-26 21:42       ` Jonathan Nieder
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2019-12-26 21:23 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Jonathan,

On Thu, 26 Dec 2019, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
>
> > During a clone of a repository that contained a file with a backslash in
> > its name in the past, as of v2.24.1(2), Git for Windows prints errors
> > like this:
> >
> > 	error: filename in tree entry contains backslash: '\'
> >
> > While the clone still succeeds, a similar error prevents the equivalent
> > `git fetch` operation, which is inconsistent.
> >
> > Arguably, this is the wrong layer for that error, anyway: As long as the
> > user never checks out the files whose names contain backslashes, there
> > should not be any problem in the first place.
>
> Hm.  The choice of right layer depends on what repositories in the wild
> contain.  If there are none containing filenames with '\', then fsck et
> al would be an appropriate layer for this.  With hindsight, it was not
> a good idea to support this kind of filename.
>
> However, between the lines of this commit messages I sense that there
> *are* repositories in the wild using these kinds of filenames.
>
> Can you say more about that?  What repositories are affected?  Do they
> contain such filenames at HEAD or only in their history?  If someone
> wants to check out a revision with such filenames, what should happen?

Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
at some stage, by mistake, a directory was called `\`. It has been fixed a
long time ago, but users obviously still want to be able to clone it ;-)

> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> >  	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> >  	int new_only = option & ADD_CACHE_NEW_ONLY;
> >
> > +#ifdef GIT_WINDOWS_NATIVE
> > +	if (protect_ntfs && strchr(ce->name, '\\'))
> > +		return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
> > +#endif
> > +
>
> Why is this specific to the GIT_WINDOWS_NATIVE case?  Wouldn't it affect
> ntfs usage on other platforms as well?
>
> [...]
> > --- a/tree-walk.c
> > +++ b/tree-walk.c
> > @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
> >  		strbuf_addstr(err, _("empty filename in tree entry"));
> >  		return -1;
> >  	}
> > -#ifdef GIT_WINDOWS_NATIVE
> > -	if (protect_ntfs && strchr(path, '\\')) {
> > -		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> > -		return -1;
> > -	}
> > -#endif
>
> Ah, it's inherited from there, so orthogonal to this patch.
>
> To summarize: I think the commit message and docs could use some work
> to describe what invariants we're trying to maintain and what
> real-world usage motivates them.

I added this paragraph to the commit message:

    Note: just as before, the check is guarded by `core.protectNTFS` (to
    allow overriding the check by toggling that config setting), and it
    is _only_ performed on Windows, as the backslash is not a directory
    separator elsewhere, even when writing to NTFS-formatted volumes.

Does that clarify the issue enough?
Dscho

>
> Thanks and hope that helps,
> Jonathan
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 21:23     ` Johannes Schindelin
@ 2019-12-26 21:42       ` Jonathan Nieder
  2019-12-26 22:01         ` Junio C Hamano
  2020-01-02 19:58         ` Johannes Schindelin
  0 siblings, 2 replies; 19+ messages in thread
From: Jonathan Nieder @ 2019-12-26 21:42 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi,

Johannes Schindelin wrote:
> On Thu, 26 Dec 2019, Jonathan Nieder wrote:
>> Johannes Schindelin wrote:

>>> Arguably, this is the wrong layer for that error, anyway: As long as the
>>> user never checks out the files whose names contain backslashes, there
>>> should not be any problem in the first place.
[...]
>>          between the lines of this commit messages I sense that there
>> *are* repositories in the wild using these kinds of filenames.
>>
>> Can you say more about that?  What repositories are affected?  Do they
>> contain such filenames at HEAD or only in their history?  If someone
>> wants to check out a revision with such filenames, what should happen?
>
> Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
> at some stage, by mistake, a directory was called `\`. It has been fixed a
> long time ago, but users obviously still want to be able to clone it ;-)

Thanks.

What should and does happen if someone tries to check out an offending
revision?

[...]
> I added this paragraph to the commit message:
>
>     Note: just as before, the check is guarded by `core.protectNTFS` (to
>     allow overriding the check by toggling that config setting), and it
>     is _only_ performed on Windows, as the backslash is not a directory
>     separator elsewhere, even when writing to NTFS-formatted volumes.
>
> Does that clarify the issue enough?

It helps: that tells me why the check is only performed on Windows.

Since this only affects Windows, please only take this review as data
about how someone read the patch.  The patch doesn't make non Windows
specific code any *less* maintainable, and as a general presumption I
assume that the Git for Windows maintainer knows best about what is
needed for maintainability of Windows-specific code.

But the commit message and docs still don't describe what the desired
behavior is.  For example, should I be able to check out a revision
with a backslash in it (e.g. via Git skipping the offending entry), or
is the intended behavior for it to error out and prevent checking out
such versions of code?

Is there anything we can or should do to prevent people checking in
new examples of paths with backslash in them (on all platforms)?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 21:42       ` Jonathan Nieder
@ 2019-12-26 22:01         ` Junio C Hamano
  2019-12-26 22:25           ` Junio C Hamano
  2020-01-02 19:58         ` Johannes Schindelin
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2019-12-26 22:01 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git

Jonathan Nieder <jrnieder@gmail.com> writes:

> Is there anything we can or should do to prevent people checking in
> new examples of paths with backslash in them (on all platforms)?

I obviously won't dictate what should happen on Windows, but I think
the overall principle for paths recorded in a tree object that can
be problematic on some of the platforms ought to be:

 * fsck and transfer.fsckobjects should be taught to notice
   offending characteristics (e.g. has a backslash in it, is one of
   the "reserved names" on some platform like LPT1).

 * if paths with the offending characteristics are *so* obviously
   useless in real life and are possible only in a crafted path that
   is only useful to attack users, the check in fsck should default
   to "reject" to help the disease spread via hosting sites.

 * otherwise, the check should be to "warn" but not "reject", so
   that projects can keep using paths that may problematic on
   platforms that do not matter to them.

I think LPT1 and friends fall into the "warning is fine" category,
and a path component that contains a backslash would fall into the
"this is an attack, just reject" category.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 22:01         ` Junio C Hamano
@ 2019-12-26 22:25           ` Junio C Hamano
  2019-12-31 22:51             ` Johannes Schindelin
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2019-12-26 22:25 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>> Is there anything we can or should do to prevent people checking in
>> new examples of paths with backslash in them (on all platforms)?
>
> I obviously won't dictate what should happen on Windows, but I think
> the overall principle for paths recorded in a tree object that can
> be problematic on some of the platforms ought to be:
>
>  * fsck and transfer.fsckobjects should be taught to notice
>    offending characteristics (e.g. has a backslash in it, is one of
>    the "reserved names" on some platform like LPT1).
>
>  * if paths with the offending characteristics are *so* obviously
>    useless in real life and are possible only in a crafted path that
>    is only useful to attack users, the check in fsck should default
>    to "reject" to help the disease spread via hosting sites.
>
>  * otherwise, the check should be to "warn" but not "reject", so
>    that projects can keep using paths that may problematic on
>    platforms that do not matter to them.
>
> I think LPT1 and friends fall into the "warning is fine" category,
> and a path component that contains a backslash would fall into the
> "this is an attack, just reject" category.

I guess I should have stepped back a bit.

In the message I am responding to, I focused solely on how tree
objects that originate elsewhere should be treated, but there are
two more data paths we need to worry about:

 * A new path gets introduced to the system, via "update-index",
   "add", etc., to the index and then "write-tree" to form tree
   objects in the local repository.

 * A path in the index, either created locally via "update-index",
   "add", etc., or read from a tree object, gets written to the
   local filesystem, resulting in an attempt to create a path that
   the local filesystem cannot represent (or worse---behaves badly,
   like "sending random garbage to the printer").

I think we should apply the same principle as the one I outlined for
the tree objects.  The fsckobjects mechanism may not be reusable to
catch violations in add_index_entry_with_check() as-is, but we need
to aim for as much reuse of logic and code as possible so that our
checks at various layers all behave consistently.

Thanks.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 21:16     ` Johannes Schindelin
@ 2019-12-30 21:57       ` Junio C Hamano
  2020-01-02 19:53         ` Johannes Schindelin
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2019-12-30 21:57 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Schindelin via GitGitGadget, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
> but not in current `master` of Git, so I simply struck that part from the
> commit message.
> ...
> I rephrased it to:
>
>     So let's loosen the requirements: we now leave tree entries with
>     backslashes in their file names alone, but we do require any entries
>     that are added to the Git index to contain no backslashes on Windows.
> ...

We are in -rc so there is no real rush, but I take these to mean
that I should just leave this loose end hanging untied, and wait
for an updated version to replace it sometime early next year.

Thanks and happy new year.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 22:25           ` Junio C Hamano
@ 2019-12-31 22:51             ` Johannes Schindelin
  0 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin @ 2019-12-31 22:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Johannes Schindelin via GitGitGadget, git

Hi Junio & Jonathan,

On Thu, 26 Dec 2019, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > Jonathan Nieder <jrnieder@gmail.com> writes:
> >
> >> Is there anything we can or should do to prevent people checking in
> >> new examples of paths with backslash in them (on all platforms)?
> >
> > I obviously won't dictate what should happen on Windows, but I think
> > the overall principle for paths recorded in a tree object that can
> > be problematic on some of the platforms ought to be:
> >
> >  * fsck and transfer.fsckobjects should be taught to notice
> >    offending characteristics (e.g. has a backslash in it, is one of
> >    the "reserved names" on some platform like LPT1).

Agree. This is on my radar, but so far not too-high priority, as the
`fsck` checks are not (yet?) standard practice in PR builds (and warnings
on the server side are prone to be ignored).

> >  * if paths with the offending characteristics are *so* obviously
> >    useless in real life and are possible only in a crafted path that
> >    is only useful to attack users, the check in fsck should default
> >    to "reject" to help the disease spread via hosting sites.

I don't think that reserved names such as `aux`, nor names containing
backslashes should be rejected _always_. While I cannot think of _any_
instance where I would want to have a backslash in a file name, I am sure
that just like `aux.c`, there _must_ be somebody out there who thought of
a file name that contains a backslash and makes at least some sort of
sense.

> >  * otherwise, the check should be to "warn" but not "reject", so
> >    that projects can keep using paths that may problematic on
> >    platforms that do not matter to them.

Yes, it should be "warn".

> > I think LPT1 and friends fall into the "warning is fine" category,
> > and a path component that contains a backslash would fall into the
> > "this is an attack, just reject" category.
>
> I guess I should have stepped back a bit.
>
> In the message I am responding to, I focused solely on how tree
> objects that originate elsewhere should be treated, but there are
> two more data paths we need to worry about:
>
>  * A new path gets introduced to the system, via "update-index",
>    "add", etc., to the index and then "write-tree" to form tree
>    objects in the local repository.

Right, that's what I had in mind when I wrote this patch. The path gets
added to the index, we detect a backslash, and on Windows (under
`core.protectNTFS`) fail with an error.

>  * A path in the index, either created locally via "update-index",
>    "add", etc., or read from a tree object, gets written to the
>    local filesystem, resulting in an attempt to create a path that
>    the local filesystem cannot represent (or worse---behaves badly,
>    like "sending random garbage to the printer").

Happily, my patch seems to catch this code path, too: when reading from a
`tree` object into the index, we use `add_index_entry()` (called via
various code paths in the `unpack_trees()` machinery). That's exactly the
patched function.

Or maybe you know of a code path in the `unpack_trees()` machinery that
does _not_ go through `add_index_entry()`? I would be very interested to
learn about such code paths.

> I think we should apply the same principle as the one I outlined for
> the tree objects.  The fsckobjects mechanism may not be reusable to
> catch violations in add_index_entry_with_check() as-is, but we need
> to aim for as much reuse of logic and code as possible so that our
> checks at various layers all behave consistently.

I am afraid that we won't be able to reuse code paths for checking the
backslash here, but for reserved names I am planning on refactoring the
code accordingly.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 0/1] Disallow writing, but not fetching commits with file names containing backslashes
  2019-12-26 17:42 [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes Johannes Schindelin via GitGitGadget
  2019-12-26 17:42 ` [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries Johannes Schindelin via GitGitGadget
  2019-12-26 19:22 ` [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes Junio C Hamano
@ 2019-12-31 22:53 ` Johannes Schindelin via GitGitGadget
  2019-12-31 22:53   ` [PATCH v2 1/1] mingw: only test index entries for backslashes, not tree entries Johannes Schindelin via GitGitGadget
  2 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-12-31 22:53 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano

As of Git for Windows, v2.24.1(2). cloning a repository that contained a
file with a backslash in its name some time in the past, the command will 
succeed but at the same time print errors like this:

    error: filename in tree entry contains backslash: '\'

A corresponding git fetch will also show those errors, but fail.

The reason is that v2.24.1 is much more strict about backslashes in tree
entries than earlier versions. The intention was actually to prevent 
checking out such files, though: if there was a mistake in a repository long 
ago that has been fixed long since, there is actually no reason why we
should require the history to be rewritten.

This fixes https://github.com/git-for-windows/git/issues/2435.

The idea of this patch is to stop complaining about tree entries, and focus
instead on the index: whenever a file is added to the index, we do not want
any backslashes in the file name on Windows.

As before, this check is only performed on Windows, and only under 
core.protectNTFS. On other platforms, even if core.protectNTFS is turned on,
the backslash is not a directory separator, therefore the Operating System's
syscalls will (should?) refuse to create files on NTFS with backslashes in
their file name.

I would appreciate reviews with a particular eye on keeping users safe: I am
not 100% certain that all relevant file writes go through the index (I think 
that they all go through the index, but I might have well missed a corner
case).

Changes since v1:

 * Clarified the commit message (what is the goal, explain that the
   requirement is now loosened, explain why the code is still 
   GIT_WINDOWS_NATIVE-only, etc).

Johannes Schindelin (1):
  mingw: only test index entries for backslashes, not tree entries

 read-cache.c               | 5 +++++
 t/t7415-submodule-names.sh | 7 ++++---
 tree-walk.c                | 6 ------
 3 files changed, 9 insertions(+), 9 deletions(-)


base-commit: 12029dc57db23baef008e77db1909367599210ee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-682%2Fdscho%2Fmingw-only-error-on-backslash-in-index-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-682/dscho/mingw-only-error-on-backslash-in-index-v2
Pull-Request: https://github.com/git/git/pull/682

Range-diff vs v1:

 1:  4a120fd0b3 ! 1:  d6da8315d3 mingw: only test index entries for backslashes, not tree entries
     @@ -8,14 +8,31 @@
      
                  error: filename in tree entry contains backslash: '\'
      
     -    While the clone still succeeds, a similar error prevents the equivalent
     -    `git fetch` operation, which is inconsistent.
     +    The idea is to prevent Git from even trying to write files with
     +    backslashes in their file names: while these characters are valid in
     +    file names on other platforms, on Windows it is interpreted as directory
     +    separator (which would obviously lead to ambiguities, e.g. when there is
     +    a file `a\b` and there is also a file `a/b`).
      
     -    Arguably, this is the wrong layer for that error, anyway: As long as the
     -    user never checks out the files whose names contain backslashes, there
     -    should not be any problem in the first place.
     +    Arguably, this is the wrong layer for that error: As long as the user
     +    never checks out the files whose names contain backslashes, there should
     +    not be any problem in the first place.
      
     -    So let's instead prevent such files to be added to the index.
     +    So let's loosen the requirements: we now leave tree entries with
     +    backslashes in their file names alone, but we do require any entries
     +    that are added to the Git index to contain no backslashes on Windows.
     +
     +    Note: just as before, the check is guarded by `core.protectNTFS` (to
     +    allow overriding the check by toggling that config setting), and it
     +    is _only_ performed on Windows, as the backslash is not a directory
     +    separator elsewhere, even when writing to NTFS-formatted volumes.
     +
     +    An alternative approach would be to try to prevent creating files with
     +    backslashes in their file names. However, that comes with its own set of
     +    problems. For example, `git config -f C:\ProgramData\Git\config ...` is
     +    a very valid way to specify a custom config location, and we obviously
     +    do _not_ want to prevent that. Therefore, the approach chosen in this
     +    patch would appear to be better.
      
          This addresses https://github.com/git-for-windows/git/issues/2435
      

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-31 22:53 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
@ 2019-12-31 22:53   ` Johannes Schindelin via GitGitGadget
  0 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-12-31 22:53 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

During a clone of a repository that contained a file with a backslash in
its name in the past, as of v2.24.1(2), Git for Windows prints errors
like this:

	error: filename in tree entry contains backslash: '\'

The idea is to prevent Git from even trying to write files with
backslashes in their file names: while these characters are valid in
file names on other platforms, on Windows it is interpreted as directory
separator (which would obviously lead to ambiguities, e.g. when there is
a file `a\b` and there is also a file `a/b`).

Arguably, this is the wrong layer for that error: As long as the user
never checks out the files whose names contain backslashes, there should
not be any problem in the first place.

So let's loosen the requirements: we now leave tree entries with
backslashes in their file names alone, but we do require any entries
that are added to the Git index to contain no backslashes on Windows.

Note: just as before, the check is guarded by `core.protectNTFS` (to
allow overriding the check by toggling that config setting), and it
is _only_ performed on Windows, as the backslash is not a directory
separator elsewhere, even when writing to NTFS-formatted volumes.

An alternative approach would be to try to prevent creating files with
backslashes in their file names. However, that comes with its own set of
problems. For example, `git config -f C:\ProgramData\Git\config ...` is
a very valid way to specify a custom config location, and we obviously
do _not_ want to prevent that. Therefore, the approach chosen in this
patch would appear to be better.

This addresses https://github.com/git-for-windows/git/issues/2435

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 read-cache.c               | 5 +++++
 t/t7415-submodule-names.sh | 7 ++++---
 tree-walk.c                | 6 ------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index ad0b48c84d..737916ebd9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
 	int new_only = option & ADD_CACHE_NEW_ONLY;
 
+#ifdef GIT_WINDOWS_NATIVE
+	if (protect_ntfs && strchr(ce->name, '\\'))
+		return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
+#endif
+
 	if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
 		cache_tree_invalidate_path(istate, ce->name);
 
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index 905a557585..7ae0dc8ff4 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -207,6 +207,9 @@ test_expect_success MINGW 'prevent git~1 squatting on Windows' '
 			git hash-object -w --stdin)" &&
 		rev="$(git rev-parse --verify HEAD)" &&
 		hash="$(echo x | git hash-object -w --stdin)" &&
+		test_must_fail git update-index --add \
+			--cacheinfo 160000,$rev,d\\a 2>err &&
+		test_i18ngrep backslash err &&
 		git -c core.protectNTFS=false update-index --add \
 			--cacheinfo 100644,$modules,.gitmodules \
 			--cacheinfo 160000,$rev,c \
@@ -214,9 +217,7 @@ test_expect_success MINGW 'prevent git~1 squatting on Windows' '
 			--cacheinfo 100644,$hash,d./a/x \
 			--cacheinfo 100644,$hash,d./a/..git &&
 		test_tick &&
-		git -c core.protectNTFS=false commit -m "module" &&
-		test_must_fail git show HEAD: 2>err &&
-		test_i18ngrep backslash err
+		git -c core.protectNTFS=false commit -m "module"
 	) &&
 	test_must_fail git -c core.protectNTFS=false \
 		clone --recurse-submodules squatting squatting-clone 2>err &&
diff --git a/tree-walk.c b/tree-walk.c
index b3d162051f..d5a8e096a6 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 		strbuf_addstr(err, _("empty filename in tree entry"));
 		return -1;
 	}
-#ifdef GIT_WINDOWS_NATIVE
-	if (protect_ntfs && strchr(path, '\\')) {
-		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
-		return -1;
-	}
-#endif
 	len = strlen(path) + 1;
 
 	/* Initialize the descriptor entry */
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-30 21:57       ` Junio C Hamano
@ 2020-01-02 19:53         ` Johannes Schindelin
  0 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin @ 2020-01-02 19:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git

Hi Junio,

On Mon, 30 Dec 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
> > but not in current `master` of Git, so I simply struck that part from the
> > commit message.
> > ...
> > I rephrased it to:
> >
> >     So let's loosen the requirements: we now leave tree entries with
> >     backslashes in their file names alone, but we do require any entries
> >     that are added to the Git index to contain no backslashes on Windows.
> > ...
>
> We are in -rc so there is no real rush,

My intention was actually to integrate this patch into Git for Windows
v2.25.0-rc1 already, that's why I sent it out for review even this late in
the -rc phase.

> but I take these to mean that I should just leave this loose end hanging
> untied, and wait for an updated version to replace it sometime early
> next year.

I sent out a new iteration of the patch *just* before the new year arrived
over here ;-)

> Thanks and happy new year.

The same to you!
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2019-12-26 21:42       ` Jonathan Nieder
  2019-12-26 22:01         ` Junio C Hamano
@ 2020-01-02 19:58         ` Johannes Schindelin
  2020-01-04  1:57           ` Jonathan Nieder
  1 sibling, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2020-01-02 19:58 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Jonathan,

On Thu, 26 Dec 2019, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> > On Thu, 26 Dec 2019, Jonathan Nieder wrote:
> >> Johannes Schindelin wrote:
>
> >>> Arguably, this is the wrong layer for that error, anyway: As long as the
> >>> user never checks out the files whose names contain backslashes, there
> >>> should not be any problem in the first place.
> [...]
> >>          between the lines of this commit messages I sense that there
> >> *are* repositories in the wild using these kinds of filenames.
> >>
> >> Can you say more about that?  What repositories are affected?  Do they
> >> contain such filenames at HEAD or only in their history?  If someone
> >> wants to check out a revision with such filenames, what should happen?
> >
> > Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
> > at some stage, by mistake, a directory was called `\`. It has been fixed a
> > long time ago, but users obviously still want to be able to clone it ;-)
>
> Thanks.
>
> What should and does happen if someone tries to check out an offending
> revision?

To answer this question, I added this paragraph to the commit message:

    The idea is to prevent Git from even trying to write files with
    backslashes in their file names: while these characters are valid in
    file names on other platforms, on Windows it is interpreted as directory
    separator (which would obviously lead to ambiguities, e.g. when there is
    a file `a\b` and there is also a file `a/b`).

> [...]
> > I added this paragraph to the commit message:
> >
> >     Note: just as before, the check is guarded by `core.protectNTFS` (to
> >     allow overriding the check by toggling that config setting), and it
> >     is _only_ performed on Windows, as the backslash is not a directory
> >     separator elsewhere, even when writing to NTFS-formatted volumes.
> >
> > Does that clarify the issue enough?
>
> It helps: that tells me why the check is only performed on Windows.
>
> Since this only affects Windows, please only take this review as data
> about how someone read the patch.  The patch doesn't make non Windows
> specific code any *less* maintainable, and as a general presumption I
> assume that the Git for Windows maintainer knows best about what is
> needed for maintainability of Windows-specific code.
>
> But the commit message and docs still don't describe what the desired
> behavior is.  For example, should I be able to check out a revision
> with a backslash in it (e.g. via Git skipping the offending entry), or
> is the intended behavior for it to error out and prevent checking out
> such versions of code?

As mentioned above, the idea is to prevent Git from attempting to create
files with illegal file name characters.

> Is there anything we can or should do to prevent people checking in
> new examples of paths with backslash in them (on all platforms)?

As mentioned in my reply to Junio, I don't think that it would be wise to
even try to warn about backslashes in the file names. There are _so_ many
Git users out there, I am convinced that at least some of them have valid
use cases for file names with backslashes in them.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2020-01-02 19:58         ` Johannes Schindelin
@ 2020-01-04  1:57           ` Jonathan Nieder
  2020-01-04 21:29             ` Johannes Schindelin
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2020-01-04  1:57 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Johannes Schindelin wrote:

> As mentioned above, the idea is to prevent Git from attempting to create
> files with illegal file name characters.
[...]
> On Thu, 26 Dec 2019, Jonathan Nieder wrote:

>> Is there anything we can or should do to prevent people checking in
>> new examples of paths with backslash in them (on all platforms)?
>
> As mentioned in my reply to Junio, I don't think that it would be wise to
> even try to warn about backslashes in the file names. There are _so_ many
> Git users out there, I am convinced that at least some of them have valid
> use cases for file names with backslashes in them.

Thanks for the quick answers.  It helps.

I think allowing people to clone
https://github.com/zephyrproject-rtos/civetweb but not to check out
the problematic historic revision is a reasonable choice, especially
since it's still possible to get the data from there using

	git checkout <rev> -- . ':!bad-paths'

[...]
> Or maybe you know of a code path in the `unpack_trees()` machinery that
> does _not_ go through `add_index_entry()`? I would be very interested to
> learn about such code paths.

Every once in a while someone (e.g., in #git) has wanted "git checkout
--skip-index <rev> -- <paths>", and that would be the natural way to
implement such a thing.  But no one has done it yet. :)

We'll just have to keep a watchful eye as people make new
contributions.

Sincerely,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
  2020-01-04  1:57           ` Jonathan Nieder
@ 2020-01-04 21:29             ` Johannes Schindelin
  0 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin @ 2020-01-04 21:29 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Jonathan,

On Fri, 3 Jan 2020, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
>
> > As mentioned above, the idea is to prevent Git from attempting to create
> > files with illegal file name characters.
> [...]
> > On Thu, 26 Dec 2019, Jonathan Nieder wrote:
>
> >> Is there anything we can or should do to prevent people checking in
> >> new examples of paths with backslash in them (on all platforms)?
> >
> > As mentioned in my reply to Junio, I don't think that it would be wise to
> > even try to warn about backslashes in the file names. There are _so_ many
> > Git users out there, I am convinced that at least some of them have valid
> > use cases for file names with backslashes in them.
>
> Thanks for the quick answers.  It helps.
>
> I think allowing people to clone
> https://github.com/zephyrproject-rtos/civetweb but not to check out
> the problematic historic revision is a reasonable choice, especially
> since it's still possible to get the data from there using
>
> 	git checkout <rev> -- . ':!bad-paths'
>
> [...]
> > Or maybe you know of a code path in the `unpack_trees()` machinery that
> > does _not_ go through `add_index_entry()`? I would be very interested to
> > learn about such code paths.
>
> Every once in a while someone (e.g., in #git) has wanted "git checkout
> --skip-index <rev> -- <paths>", and that would be the natural way to
> implement such a thing.  But no one has done it yet. :)
>
> We'll just have to keep a watchful eye as people make new
> contributions.

Right.

As to my patch, in the meantime I wondered whether I should simply extend
`verify_path()` to take a `flags` parameter where I can specify whether
this is intended for checkout, and handle the backslashes _there_. That
would cover e.g. uses of `make_transient_cache_entry()`, too.

The only reason why I would want to introduce `flags` is that `git apply`
has this `--unsafe-paths` option where `verify_paths()` is called on the
file names in the diff, and I was unsure whether there are legitimate use
cases with diffs containing DOS-style paths (I suspect that there are
legitimate scenarios where this is desirable).

What do you think of that?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-01-04 21:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-26 17:42 [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes Johannes Schindelin via GitGitGadget
2019-12-26 17:42 ` [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries Johannes Schindelin via GitGitGadget
2019-12-26 18:56   ` Junio C Hamano
2019-12-26 21:16     ` Johannes Schindelin
2019-12-30 21:57       ` Junio C Hamano
2020-01-02 19:53         ` Johannes Schindelin
2019-12-26 20:03   ` Jonathan Nieder
2019-12-26 21:23     ` Johannes Schindelin
2019-12-26 21:42       ` Jonathan Nieder
2019-12-26 22:01         ` Junio C Hamano
2019-12-26 22:25           ` Junio C Hamano
2019-12-31 22:51             ` Johannes Schindelin
2020-01-02 19:58         ` Johannes Schindelin
2020-01-04  1:57           ` Jonathan Nieder
2020-01-04 21:29             ` Johannes Schindelin
2019-12-26 19:22 ` [PATCH 0/1] Disallow writing, but not fetching commits with file names containing backslashes Junio C Hamano
2019-12-26 21:19   ` Johannes Schindelin
2019-12-31 22:53 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
2019-12-31 22:53   ` [PATCH v2 1/1] mingw: only test index entries for backslashes, not tree entries Johannes Schindelin via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).