git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Git clone and case sensitivity
@ 2018-07-27  9:59 Paweł Paruzel
  2018-07-27 20:59 ` brian m. carlson
  0 siblings, 1 reply; 96+ messages in thread
From: Paweł Paruzel @ 2018-07-27  9:59 UTC (permalink / raw)
  To: git

Hi,

Lately, I have been wondering why my test files in repo are modified after I clone it. It turned out to be two files: boolStyle_t_f and boolStyle_T_F.
The system that pushed those files was case sensitive while my mac after High Sierra update had APFS which is by default case-insensitive. I highly suggest that git clone threw an exception when files are case sensitive and being cloned to a case insensitive system. This has caused problems with overriding files for test cases without any warning.

Thanks in advance.
Regards,
Pawel Paruzel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-27  9:59 Git clone and case sensitivity Paweł Paruzel
@ 2018-07-27 20:59 ` brian m. carlson
  2018-07-28  4:36   ` Duy Nguyen
  0 siblings, 1 reply; 96+ messages in thread
From: brian m. carlson @ 2018-07-27 20:59 UTC (permalink / raw)
  To: Paweł Paruzel; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]

On Fri, Jul 27, 2018 at 11:59:33AM +0200, Paweł Paruzel wrote:
> Hi,
> 
> Lately, I have been wondering why my test files in repo are modified
> after I clone it. It turned out to be two files: boolStyle_t_f and
> boolStyle_T_F.
> The system that pushed those files was case sensitive while my mac
> after High Sierra update had APFS which is by default
> case-insensitive. I highly suggest that git clone threw an exception
> when files are case sensitive and being cloned to a case insensitive
> system. This has caused problems with overriding files for test cases
> without any warning.

If we did what you proposed, it would be impossible to clone such a
repository on a case-insensitive system.  While this might be fine for a
closed system such as inside a company, this would make many open source
repositories unusable, even when the files differing in case are
nonfunctional (like README and readme).

This is actually one of a few ways people can make repositories that
will show as modified on Windows or macOS systems, due to limitations in
those OSes.  If you want to be sure that your repository is unmodified
after clone, you can ensure that the output of git status --porcelain is
empty, such as by checking for a zero exit from
"test $(git status --porcelain | wc -l) -eq 0".
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-27 20:59 ` brian m. carlson
@ 2018-07-28  4:36   ` Duy Nguyen
  2018-07-28  4:45     ` Duy Nguyen
  0 siblings, 1 reply; 96+ messages in thread
From: Duy Nguyen @ 2018-07-28  4:36 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Paweł Paruzel, git

On Fri, Jul 27, 2018 at 08:59:09PM +0000, brian m. carlson wrote:
> On Fri, Jul 27, 2018 at 11:59:33AM +0200, Paweł Paruzel wrote:
> > Hi,
> > 
> > Lately, I have been wondering why my test files in repo are modified
> > after I clone it. It turned out to be two files: boolStyle_t_f and
> > boolStyle_T_F.
> > The system that pushed those files was case sensitive while my mac
> > after High Sierra update had APFS which is by default
> > case-insensitive. I highly suggest that git clone threw an exception
> > when files are case sensitive and being cloned to a case insensitive
> > system. This has caused problems with overriding files for test cases
> > without any warning.
> 
> If we did what you proposed, it would be impossible to clone such a
> repository on a case-insensitive system.

I agree throwing a real exception would be bad. But how about detecting
the problem and trying our best to keep the repo in somewhat usable
state like this?

This patch uses sparse checkout to hide all those paths that we fail
to checkout, so you can still have a clean worktree to do things, as
long as you don't touch those paths.

-- 8< --
diff --git a/builtin/clone.c b/builtin/clone.c
index 1d939af9d8..a6b5e2c948 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -711,6 +711,30 @@ static void update_head(const struct ref *our, const struct ref *remote,
 	}
 }
 
+static int enable_sparse_checkout_on_icase_fs(struct index_state *istate)
+{
+	int i;
+	int skip_count = 0;
+	FILE *fp = fopen(git_path("info/sparse"), "a+");
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (!ce_skip_worktree(ce))
+			continue;
+		if (!skip_count) {
+			git_config_set_multivar_gently("core.sparseCheckout",
+						       "true",
+						       CONFIG_REGEX_NONE, 0);
+			fprintf(fp, "# List of paths hidden by 'git clone'\n");
+		}
+		fprintf(fp, "/%s\n", ce->name);
+		skip_count++;
+	}
+	fclose(fp);
+
+	return skip_count;
+}
+
 static int checkout(int submodule_progress)
 {
 	struct object_id oid;
@@ -751,6 +775,7 @@ static int checkout(int submodule_progress)
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
+	opts.clone_checkout = 1;
 
 	tree = parse_tree_indirect(&oid);
 	parse_tree(tree);
@@ -761,6 +786,12 @@ static int checkout(int submodule_progress)
 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 
+	if (enable_sparse_checkout_on_icase_fs(&the_index))
+		warning("Paths that differ only in case are detected "
+			"and will not work correctly on this case-insensitive "
+			"filesystem. Sparse checkout has been enabled to hide "
+			"these paths.");
+
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   oid_to_hex(&oid), "1", NULL);
 
diff --git a/cache.h b/cache.h
index 8b447652a7..9ecf7ad952 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
 	unsigned force:1,
 		 quiet:1,
 		 not_new:1,
+		 set_skipworktree_on_updated:1,
 		 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..ba21db63e7 100644
--- a/entry.c
+++ b/entry.c
@@ -447,6 +447,11 @@ int checkout_entry(struct cache_entry *ce,
 
 		if (!changed)
 			return 0;
+		if (state->set_skipworktree_on_updated) {
+			ce->ce_flags |= CE_SKIP_WORKTREE;
+			state->istate->cache_changed |= CE_ENTRY_CHANGED;
+			return 0;
+		}
 		if (!state->force) {
 			if (!state->quiet)
 				fprintf(stderr,
diff --git a/unpack-trees.c b/unpack-trees.c
index 66741130ae..a8a24e0b13 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -358,6 +358,7 @@ static int check_updates(struct unpack_trees_options *o)
 	state.quiet = 1;
 	state.refresh_cache = 1;
 	state.istate = index;
+	state.set_skipworktree_on_updated = o->clone_checkout;
 
 	progress = get_progress(o);
 
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..8ebe2e2ec5 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -49,6 +49,7 @@ struct unpack_trees_options {
 		     aggressive,
 		     skip_unmerged,
 		     initial_checkout,
+		     clone_checkout,
 		     diff_index_cached,
 		     debug_unpack,
 		     skip_sparse_checkout,
-- 8< --

> While this might be fine for a closed system such as inside a
> company, this would make many open source repositories unusable,
> even when the files differing in case are nonfunctional (like README
> and readme).
> 
> This is actually one of a few ways people can make repositories that
> will show as modified on Windows or macOS systems, due to limitations in
> those OSes.  If you want to be sure that your repository is unmodified
> after clone, you can ensure that the output of git status --porcelain is
> empty, such as by checking for a zero exit from
> "test $(git status --porcelain | wc -l) -eq 0".
> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-28  4:36   ` Duy Nguyen
@ 2018-07-28  4:45     ` Duy Nguyen
  2018-07-28  4:48       ` Jeff King
  0 siblings, 1 reply; 96+ messages in thread
From: Duy Nguyen @ 2018-07-28  4:45 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Paweł Paruzel, Git Mailing List

On Sat, Jul 28, 2018 at 6:36 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Fri, Jul 27, 2018 at 08:59:09PM +0000, brian m. carlson wrote:
> > On Fri, Jul 27, 2018 at 11:59:33AM +0200, Paweł Paruzel wrote:
> > > Hi,
> > >
> > > Lately, I have been wondering why my test files in repo are modified
> > > after I clone it. It turned out to be two files: boolStyle_t_f and
> > > boolStyle_T_F.
> > > The system that pushed those files was case sensitive while my mac
> > > after High Sierra update had APFS which is by default
> > > case-insensitive. I highly suggest that git clone threw an exception
> > > when files are case sensitive and being cloned to a case insensitive
> > > system. This has caused problems with overriding files for test cases
> > > without any warning.
> >
> > If we did what you proposed, it would be impossible to clone such a
> > repository on a case-insensitive system.
>
> I agree throwing a real exception would be bad. But how about detecting
> the problem and trying our best to keep the repo in somewhat usable
> state like this?
>
> This patch uses sparse checkout to hide all those paths that we fail
> to checkout, so you can still have a clean worktree to do things, as
> long as you don't touch those paths.

Side note. There may still be problems with this patch. Let's use
vim-colorschemes.git as an example, which has darkBlue.vim and
darkblue.vim.

Say we have checked out darkBlue.vim and hidden darkblue.vim. When you
update darkBlue.vim on worktree and then update the index, are we sure
we will update darkBlue.vim entry and not (hidden) darkblue.vim? I am
not sure. I don't think our lookup function is prepared to deal with
this. Maybe it's best to hide both of them.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-28  4:45     ` Duy Nguyen
@ 2018-07-28  4:48       ` Jeff King
  2018-07-28  5:11         ` Duy Nguyen
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-07-28  4:48 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List

On Sat, Jul 28, 2018 at 06:45:43AM +0200, Duy Nguyen wrote:

> > I agree throwing a real exception would be bad. But how about detecting
> > the problem and trying our best to keep the repo in somewhat usable
> > state like this?
> >
> > This patch uses sparse checkout to hide all those paths that we fail
> > to checkout, so you can still have a clean worktree to do things, as
> > long as you don't touch those paths.
> 
> Side note. There may still be problems with this patch. Let's use
> vim-colorschemes.git as an example, which has darkBlue.vim and
> darkblue.vim.
> 
> Say we have checked out darkBlue.vim and hidden darkblue.vim. When you
> update darkBlue.vim on worktree and then update the index, are we sure
> we will update darkBlue.vim entry and not (hidden) darkblue.vim? I am
> not sure. I don't think our lookup function is prepared to deal with
> this. Maybe it's best to hide both of them.

It might be enough to just issue a warning and give an advise() hint
that tells the user what's going on. Then they can decide what to do
(hide both paths, or just work in the index, or move to a different fs,
or complain to upstream).

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-28  4:48       ` Jeff King
@ 2018-07-28  5:11         ` Duy Nguyen
  2018-07-28  9:48           ` Simon Ruderich
  2018-07-28  9:56           ` Jeff King
  0 siblings, 2 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-07-28  5:11 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List

On Sat, Jul 28, 2018 at 12:48:57AM -0400, Jeff King wrote:
> On Sat, Jul 28, 2018 at 06:45:43AM +0200, Duy Nguyen wrote:
> 
> > > I agree throwing a real exception would be bad. But how about detecting
> > > the problem and trying our best to keep the repo in somewhat usable
> > > state like this?
> > >
> > > This patch uses sparse checkout to hide all those paths that we fail
> > > to checkout, so you can still have a clean worktree to do things, as
> > > long as you don't touch those paths.
> > 
> > Side note. There may still be problems with this patch. Let's use
> > vim-colorschemes.git as an example, which has darkBlue.vim and
> > darkblue.vim.
> > 
> > Say we have checked out darkBlue.vim and hidden darkblue.vim. When you
> > update darkBlue.vim on worktree and then update the index, are we sure
> > we will update darkBlue.vim entry and not (hidden) darkblue.vim? I am
> > not sure. I don't think our lookup function is prepared to deal with
> > this. Maybe it's best to hide both of them.
> 
> It might be enough to just issue a warning and give an advise() hint
> that tells the user what's going on. Then they can decide what to do
> (hide both paths, or just work in the index, or move to a different fs,
> or complain to upstream).

Yeah that may be the best option. Something like this perhaps? Not
sure how much detail the advice should be here.

-- 8< --
diff --git a/builtin/clone.c b/builtin/clone.c
index 1d939af9d8..b47ad5877b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -711,6 +711,30 @@ static void update_head(const struct ref *our, const struct ref *remote,
 	}
 }
 
+static int has_duplicate_icase_entries(struct index_state *istate)
+{
+	struct string_list list = STRING_LIST_INIT_NODUP;
+	int i;
+	int found = 0;
+
+	for (i = 0; i < istate->cache_nr; i++)
+		string_list_append(&list, istate->cache[i]->name);
+
+	list.cmp = strcasecmp;
+	string_list_sort(&list);
+
+	for (i = 1; i < list.nr; i++) {
+		if (strcasecmp(list.items[i-1].string,
+			       list.items[i].string))
+			continue;
+		found = 1;
+		break;
+	}
+	string_list_clear(&list, 0);
+
+	return found;
+}
+
 static int checkout(int submodule_progress)
 {
 	struct object_id oid;
@@ -761,6 +785,11 @@ static int checkout(int submodule_progress)
 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 
+	if (ignore_case && has_duplicate_icase_entries(&the_index))
+		warning(_("This repository has paths that only differ in case\n"
+			  "and you have a case-insenitive filesystem which will\n"
+			  "cause problems."));
+
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   oid_to_hex(&oid), "1", NULL);
 
-- 8< --

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-28  5:11         ` Duy Nguyen
@ 2018-07-28  9:48           ` Simon Ruderich
  2018-07-28  9:56           ` Jeff King
  1 sibling, 0 replies; 96+ messages in thread
From: Simon Ruderich @ 2018-07-28  9:48 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jeff King, brian m. carlson, Paweł Paruzel, Git Mailing List

On Sat, Jul 28, 2018 at 07:11:05AM +0200, Duy Nguyen wrote:
>  static int checkout(int submodule_progress)
>  {
>  	struct object_id oid;
> @@ -761,6 +785,11 @@ static int checkout(int submodule_progress)
>  	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
>  		die(_("unable to write new index file"));
>
> +	if (ignore_case && has_duplicate_icase_entries(&the_index))
> +		warning(_("This repository has paths that only differ in case\n"
> +			  "and you have a case-insenitive filesystem which will\n"
> +			  "cause problems."));
> +
>  	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>  			   oid_to_hex(&oid), "1", NULL);

I think the advice message should list the problematic file
names. Even though this might be quite verbose it will help those
affected to quickly find the problematic files to either fix this
on their own or report to upstream (unless there's already an
easy way to find those files - if so it should be mentioned in
the message).

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-28  5:11         ` Duy Nguyen
  2018-07-28  9:48           ` Simon Ruderich
@ 2018-07-28  9:56           ` Jeff King
  2018-07-28 18:05             ` brian m. carlson
  2018-07-29  5:26             ` Duy Nguyen
  1 sibling, 2 replies; 96+ messages in thread
From: Jeff King @ 2018-07-28  9:56 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List

On Sat, Jul 28, 2018 at 07:11:05AM +0200, Duy Nguyen wrote:

> > It might be enough to just issue a warning and give an advise() hint
> > that tells the user what's going on. Then they can decide what to do
> > (hide both paths, or just work in the index, or move to a different fs,
> > or complain to upstream).
> 
> Yeah that may be the best option. Something like this perhaps? Not
> sure how much detail the advice should be here.

Yeah, something along these lines.  I agree with Simon's comment
elsewhere that this should probably mention the names. I don't know if
we'd want to offer advice pointing them to using the sparse feature to
work around it.

> +static int has_duplicate_icase_entries(struct index_state *istate)
> +{
> +	struct string_list list = STRING_LIST_INIT_NODUP;
> +	int i;
> +	int found = 0;
> +
> +	for (i = 0; i < istate->cache_nr; i++)
> +		string_list_append(&list, istate->cache[i]->name);
> +
> +	list.cmp = strcasecmp;
> +	string_list_sort(&list);
> +
> +	for (i = 1; i < list.nr; i++) {
> +		if (strcasecmp(list.items[i-1].string,
> +			       list.items[i].string))
> +			continue;
> +		found = 1;
> +		break;
> +	}
> +	string_list_clear(&list, 0);
> +
> +	return found;
> +}

strcasecmp() will only catch a subset of the cases. We really need to
follow the same folding rules that the filesystem would.

For the case of clone, I actually wonder if we could detect during the
checkout step that a file already exists. Since we know that the
directory we started with was empty, then if it does, either:

  - there's some funny case-folding going on that means two paths in the
    repository map to the same name in the filesystem; or

  - somebody else is writing to the directory at the same time as us

Either of which I think would be worth warning about. I'm not sure if we
already lstat() the paths we're writing anyway as part of the checkout,
so we might even get the feature "for free".

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-28  9:56           ` Jeff King
@ 2018-07-28 18:05             ` brian m. carlson
  2018-07-29  5:26             ` Duy Nguyen
  1 sibling, 0 replies; 96+ messages in thread
From: brian m. carlson @ 2018-07-28 18:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Paweł Paruzel, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

On Sat, Jul 28, 2018 at 05:56:59AM -0400, Jeff King wrote:
> strcasecmp() will only catch a subset of the cases. We really need to
> follow the same folding rules that the filesystem would.
> 
> For the case of clone, I actually wonder if we could detect during the
> checkout step that a file already exists. Since we know that the
> directory we started with was empty, then if it does, either:
> 
>   - there's some funny case-folding going on that means two paths in the
>     repository map to the same name in the filesystem; or
> 
>   - somebody else is writing to the directory at the same time as us
> 
> Either of which I think would be worth warning about. I'm not sure if we
> already lstat() the paths we're writing anyway as part of the checkout,
> so we might even get the feature "for free".

This is possible to do.  From the bug I accidentally introduced in 2.16,
we know that on clone, there is a code path that is only traversed when
we hit this case and only on case-insensitive file systems.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-28  9:56           ` Jeff King
  2018-07-28 18:05             ` brian m. carlson
@ 2018-07-29  5:26             ` Duy Nguyen
  2018-07-29  9:28               ` Jeff King
  1 sibling, 1 reply; 96+ messages in thread
From: Duy Nguyen @ 2018-07-29  5:26 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List

On Sat, Jul 28, 2018 at 11:57 AM Jeff King <peff@peff.net> wrote:
> > +static int has_duplicate_icase_entries(struct index_state *istate)
> > +{
> > +     struct string_list list = STRING_LIST_INIT_NODUP;
> > +     int i;
> > +     int found = 0;
> > +
> > +     for (i = 0; i < istate->cache_nr; i++)
> > +             string_list_append(&list, istate->cache[i]->name);
> > +
> > +     list.cmp = strcasecmp;
> > +     string_list_sort(&list);
> > +
> > +     for (i = 1; i < list.nr; i++) {
> > +             if (strcasecmp(list.items[i-1].string,
> > +                            list.items[i].string))
> > +                     continue;
> > +             found = 1;
> > +             break;
> > +     }
> > +     string_list_clear(&list, 0);
> > +
> > +     return found;
> > +}
>
> strcasecmp() will only catch a subset of the cases. We really need to
> follow the same folding rules that the filesystem would.

True. But that's how we handle case insensitivity internally. If a
filesytem has more sophisticated folding rules then git will not work
well on that one anyway.

> For the case of clone, I actually wonder if we could detect during the
> checkout step that a file already exists. Since we know that the
> directory we started with was empty, then if it does, either:
>
>   - there's some funny case-folding going on that means two paths in the
>     repository map to the same name in the filesystem; or
>
>   - somebody else is writing to the directory at the same time as us

This is exactly what my first patch does (minus the sparse checkout
part).  But without knowing the exact folding rules, I don't think we
can locate this "somebody else" who wrote the first path. So if N
paths are treated the same by this filesystem, we could only report
N-1 of them.

If we want to report just one path when this happens though, then this
works quite well.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-29  5:26             ` Duy Nguyen
@ 2018-07-29  9:28               ` Jeff King
  2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2018-07-31 19:39                 ` Git clone and case sensitivity Jeff Hostetler
  0 siblings, 2 replies; 96+ messages in thread
From: Jeff King @ 2018-07-29  9:28 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List

On Sun, Jul 29, 2018 at 07:26:41AM +0200, Duy Nguyen wrote:

> > strcasecmp() will only catch a subset of the cases. We really need to
> > follow the same folding rules that the filesystem would.
> 
> True. But that's how we handle case insensitivity internally. If a
> filesytem has more sophisticated folding rules then git will not work
> well on that one anyway.

Hrm. Yeah, I guess that's the best we can do for the actual in-memory
checks. Everything else depends on doing an actual filesystem operation,
and our icase stuff kicks in way before then. I was mostly thinking of
HFS+ utf8 normalization weirdness, but I guess people are accustomed to
that by now.

> > For the case of clone, I actually wonder if we could detect during the
> > checkout step that a file already exists. Since we know that the
> > directory we started with was empty, then if it does, either:
> >
> >   - there's some funny case-folding going on that means two paths in the
> >     repository map to the same name in the filesystem; or
> >
> >   - somebody else is writing to the directory at the same time as us
> 
> This is exactly what my first patch does (minus the sparse checkout
> part).

Right, sorry, I should have read that one more carefully.

> But without knowing the exact folding rules, I don't think we can
> locate this "somebody else" who wrote the first path. So if N paths
> are treated the same by this filesystem, we could only report N-1 of
> them.
> 
> If we want to report just one path when this happens though, then this
> works quite well.

Hmm. Since most such systems are case-preserving, would it be possible
to report the name of the existing file? Doing it via opendir/readdir is
hacky, and anyway puts the burden on us to find the matching name. Doing
it via fstat() on the opened file doesn't work because at that the
filesystem has resolved the name to an inode.

So yeah, perhaps strcasecmp() is the best we can do (I do agree that
being able to mention all of the conflicting names is a benefit).

I guess we should be using fspathcmp(), though, in case it later learns
to be smarter.

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-29  9:28               ` Jeff King
@ 2018-07-30 15:27                 ` Nguyễn Thái Ngọc Duy
  2018-07-31 18:23                   ` Torsten Bögershausen
                                     ` (3 more replies)
  2018-07-31 19:39                 ` Git clone and case sensitivity Jeff Hostetler
  1 sibling, 4 replies; 96+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-07-30 15:27 UTC (permalink / raw)
  To: peff; +Cc: git, pawelparuzel95, pclouds, sandals

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what's exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. I have not suggested any way to work around or fix this
problem. But I guess we could probably have a section in
Documentation/ dedicated to this problem and point there instead of
a long advice in this warning.

Another thing we probably should do is catch in "git checkout" too,
not just "git clone" since your linux/unix colleage colleague may
accidentally add some files that your mac/windows machine is not very
happy with. But then there's another problem, once the problem is
known, we probably should stop spamming this warning at every
checkout, but how?

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..32738c2737 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -711,6 +711,33 @@ static void update_head(const struct ref *our, const struct ref *remote,
 	}
 }
 
+static void find_duplicate_icase_entries(struct index_state *istate,
+					 struct string_list *dup)
+{
+	struct string_list list = STRING_LIST_INIT_NODUP;
+	int i;
+
+	for (i = 0; i < istate->cache_nr; i++)
+		string_list_append(&list, istate->cache[i]->name);
+
+	list.cmp = fspathcmp;
+	string_list_sort(&list);
+
+	for (i = 1; i < list.nr; i++) {
+		const char *cur = list.items[i].string;
+		const char *prev = list.items[i - 1].string;
+
+		if (dup->nr &&
+		    !fspathcmp(cur, dup->items[dup->nr - 1].string)) {
+			string_list_append(dup, cur);
+		} else if (!fspathcmp(cur, prev)) {
+			string_list_append(dup, prev);
+			string_list_append(dup, cur);
+		}
+	}
+	string_list_clear(&list, 0);
+}
+
 static int checkout(int submodule_progress)
 {
 	struct object_id oid;
@@ -761,6 +788,20 @@ static int checkout(int submodule_progress)
 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 
+	if (ignore_case) {
+		struct string_list dup = STRING_LIST_INIT_DUP;
+		int i;
+
+		find_duplicate_icase_entries(&the_index, &dup);
+		if (dup.nr) {
+			warning(_("the following paths in this repository only differ in case and will\n"
+				  "cause problems because you have cloned it on an case-insensitive filesytem:\n"));
+			for (i = 0; i < dup.nr; i++)
+				fprintf(stderr, "\t%s\n", dup.items[i].string);
+		}
+		string_list_clear(&dup, 0);
+	}
+
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   oid_to_hex(&oid), "1", NULL);
 
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
@ 2018-07-31 18:23                   ` Torsten Bögershausen
  2018-08-01 15:25                     ` Duy Nguyen
  2018-07-31 18:44                   ` Elijah Newren
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 96+ messages in thread
From: Torsten Bögershausen @ 2018-07-31 18:23 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: peff, git, pawelparuzel95, sandals

On Mon, Jul 30, 2018 at 05:27:55PM +0200, Nguyễn Thái Ngọc Duy wrote:
> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what's exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. I have not suggested any way to work around or fix this
> problem. But I guess we could probably have a section in
> Documentation/ dedicated to this problem and point there instead of
> a long advice in this warning.
> 
> Another thing we probably should do is catch in "git checkout" too,
> not just "git clone" since your linux/unix colleage colleague may
> accidentally add some files that your mac/windows machine is not very
> happy with. But then there's another problem, once the problem is
> known, we probably should stop spamming this warning at every
> checkout, but how?
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/clone.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5c439f1394..32738c2737 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -711,6 +711,33 @@ static void update_head(const struct ref *our, const struct ref *remote,
>  	}
>  }
>  
> +static void find_duplicate_icase_entries(struct index_state *istate,
> +					 struct string_list *dup)
> +{
> +	struct string_list list = STRING_LIST_INIT_NODUP;
> +	int i;
> +
> +	for (i = 0; i < istate->cache_nr; i++)
> +		string_list_append(&list, istate->cache[i]->name);
> +
> +	list.cmp = fspathcmp;
> +	string_list_sort(&list);
> +
> +	for (i = 1; i < list.nr; i++) {
> +		const char *cur = list.items[i].string;
> +		const char *prev = list.items[i - 1].string;
> +
> +		if (dup->nr &&
> +		    !fspathcmp(cur, dup->items[dup->nr - 1].string)) {
> +			string_list_append(dup, cur);
> +		} else if (!fspathcmp(cur, prev)) {
> +			string_list_append(dup, prev);
> +			string_list_append(dup, cur);
> +		}
> +	}
> +	string_list_clear(&list, 0);
> +}
> +
>  static int checkout(int submodule_progress)
>  {
>  	struct object_id oid;
> @@ -761,6 +788,20 @@ static int checkout(int submodule_progress)
>  	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
>  		die(_("unable to write new index file"));
>  
> +	if (ignore_case) {
> +		struct string_list dup = STRING_LIST_INIT_DUP;
> +		int i;
> +
> +		find_duplicate_icase_entries(&the_index, &dup);
> +		if (dup.nr) {
> +			warning(_("the following paths in this repository only differ in case and will\n"
> +				  "cause problems because you have cloned it on an case-insensitive filesytem:\n"));

Thanks for the patch.
I wonder if we can tell the users more about the "problems"
and how to avoid them, or to live with them.

This is more loud thinking:

"The following paths only differ in case\n"
"One a case-insensitive file system only one at a time can be present\n"
"You may rename one like this:\n"
"git checkout <file> && git mv <file> <file>.1\n"

> +				fprintf(stderr, "\t%s\n", dup.items[i].string);

Another question:
Do we need any quote_path() here ?
(This may be overkill, since typically the repos with conflicting names
only use ASCII.)

> +		}
> +		string_list_clear(&dup, 0);
> +	}
> +
>  	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>  			   oid_to_hex(&oid), "1", NULL);
>  
> -- 
> 2.18.0.656.gda699b98b3
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2018-07-31 18:23                   ` Torsten Bögershausen
@ 2018-07-31 18:44                   ` Elijah Newren
  2018-07-31 19:12                     ` Junio C Hamano
  2018-08-01 15:21                     ` Duy Nguyen
  2018-07-31 19:13                   ` Junio C Hamano
  2018-08-07 19:01                   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  3 siblings, 2 replies; 96+ messages in thread
From: Elijah Newren @ 2018-07-31 18:44 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Jeff King, Git Mailing List, pawelparuzel95, brian m. carlson

On Mon, Jul 30, 2018 at 8:27 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what's exactly is "dirty".

"what" rather than "what's"?

> This patch helps the situation a bit by pointing out the problem at
> clone time. I have not suggested any way to work around or fix this
> problem. But I guess we could probably have a section in
> Documentation/ dedicated to this problem and point there instead of
> a long advice in this warning.
>
> Another thing we probably should do is catch in "git checkout" too,
> not just "git clone" since your linux/unix colleage colleague may

drop "colleage", keep "colleague"?

> accidentally add some files that your mac/windows machine is not very
> happy with. But then there's another problem, once the problem is
> known, we probably should stop spamming this warning at every
> checkout, but how?

Good questions.  I have no answers.

>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/clone.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5c439f1394..32738c2737 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -711,6 +711,33 @@ static void update_head(const struct ref *our, const struct ref *remote,
>         }
>  }
>
> +static void find_duplicate_icase_entries(struct index_state *istate,
> +                                        struct string_list *dup)
> +{
> +       struct string_list list = STRING_LIST_INIT_NODUP;
> +       int i;
> +
> +       for (i = 0; i < istate->cache_nr; i++)
> +               string_list_append(&list, istate->cache[i]->name);
> +
> +       list.cmp = fspathcmp;
> +       string_list_sort(&list);

So, you sort the list by fspathcmp to get the entries that differ in
case only next to each other.  Makes sense...

> +
> +       for (i = 1; i < list.nr; i++) {
> +               const char *cur = list.items[i].string;
> +               const char *prev = list.items[i - 1].string;
> +
> +               if (dup->nr &&
> +                   !fspathcmp(cur, dup->items[dup->nr - 1].string)) {
> +                       string_list_append(dup, cur);

If we have at least one duplicate in dup (and currently we'd have to
have at least two), and we now hit yet another (i.e. a third or
fourth...) way of spelling the same path, then we add it.

> +               } else if (!fspathcmp(cur, prev)) {
> +                       string_list_append(dup, prev);
> +                       string_list_append(dup, cur);
> +               }

...otherwise, if we find a duplicate, we add both spellings to the dup list.

> +       }
> +       string_list_clear(&list, 0);
> +}
> +
>  static int checkout(int submodule_progress)
>  {
>         struct object_id oid;
> @@ -761,6 +788,20 @@ static int checkout(int submodule_progress)
>         if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
>                 die(_("unable to write new index file"));
>
> +       if (ignore_case) {
> +               struct string_list dup = STRING_LIST_INIT_DUP;
> +               int i;
> +
> +               find_duplicate_icase_entries(&the_index, &dup);
> +               if (dup.nr) {
> +                       warning(_("the following paths in this repository only differ in case and will\n"

Perhaps I'm being excessively pedantic, but what if there are multiple
pairs of paths differing in case?  E.g. if someone has readme.txt,
README.txt, foo.txt, and FOO.txt, you'll list all four files but
readme.txt and foo.txt do not only differ in case.

Maybe something like "...only differ in case from another path and
will... " or is that too verbose and annoying?

> +                                 "cause problems because you have cloned it on an case-insensitive filesytem:\n"));
> +                       for (i = 0; i < dup.nr; i++)
> +                               fprintf(stderr, "\t%s\n", dup.items[i].string);
> +               }
> +               string_list_clear(&dup, 0);
> +       }
> +
>         err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>                            oid_to_hex(&oid), "1", NULL);

Is it worth attempting to also warn about paths that only differ in
UTF-normalization on relevant MacOS systems?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 18:44                   ` Elijah Newren
@ 2018-07-31 19:12                     ` Junio C Hamano
  2018-07-31 19:29                       ` Jeff King
  2018-08-01 15:21                     ` Duy Nguyen
  1 sibling, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-07-31 19:12 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Nguyễn Thái Ngọc Duy, Jeff King,
	Git Mailing List, pawelparuzel95, brian m. carlson

Elijah Newren <newren@gmail.com> writes:

> Is it worth attempting to also warn about paths that only differ in
> UTF-normalization on relevant MacOS systems?

I hate to bring up a totally different approach this late in the
party, but I wonder if it makes more sense to take advantage of
"clone" being a command that starts from an empty working tree.

builtin/clone.c::checkout() drives a single-tree unpack_trees(),
using oneway_merge() as its callback and at the end, eventually
unpack_trees.c:check_updates() will call into checkout_entry()
to perform the usual "unlink and then create" dance.

I wonder if it makes sense to introduce a new option to tell the
machinery to report when the final checkout_entry() notices that it
needed to remove the working tree file to make room (perhaps that
bit would go in "struct unpack_trees_options").  In the initial
checkout codepath for a freshly cloned repository, that would only
happen when your tree has two (or more) paths that gets smashed
by case insensitive or UTF-normalizing filesystem, and the code we
would maintain do not have to care how exactly the filesystem
collapses two (or more) paths if we go that way.  We only need to
report "we tried to check out X but it seems your filesystem equates
something else that is also in the project to X".




^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2018-07-31 18:23                   ` Torsten Bögershausen
  2018-07-31 18:44                   ` Elijah Newren
@ 2018-07-31 19:13                   ` Junio C Hamano
  2018-08-01 15:16                     ` Duy Nguyen
  2018-08-07 19:01                   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  3 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-07-31 19:13 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: peff, git, pawelparuzel95, sandals

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Another thing we probably should do is catch in "git checkout" too,
> not just "git clone" since your linux/unix colleage colleague may
> accidentally add some files that your mac/windows machine is not very
> happy with.

Then you would catch it not in checkout but in add, no?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 19:12                     ` Junio C Hamano
@ 2018-07-31 19:29                       ` Jeff King
  2018-07-31 20:12                         ` Junio C Hamano
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-07-31 19:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren, Nguyễn Thái Ngọc Duy,
	Git Mailing List, pawelparuzel95, brian m. carlson

On Tue, Jul 31, 2018 at 12:12:58PM -0700, Junio C Hamano wrote:

> Elijah Newren <newren@gmail.com> writes:
> 
> > Is it worth attempting to also warn about paths that only differ in
> > UTF-normalization on relevant MacOS systems?
> 
> I hate to bring up a totally different approach this late in the
> party, but I wonder if it makes more sense to take advantage of
> "clone" being a command that starts from an empty working tree.
> 
> builtin/clone.c::checkout() drives a single-tree unpack_trees(),
> using oneway_merge() as its callback and at the end, eventually
> unpack_trees.c:check_updates() will call into checkout_entry()
> to perform the usual "unlink and then create" dance.
> 
> I wonder if it makes sense to introduce a new option to tell the
> machinery to report when the final checkout_entry() notices that it
> needed to remove the working tree file to make room (perhaps that
> bit would go in "struct unpack_trees_options").  In the initial
> checkout codepath for a freshly cloned repository, that would only
> happen when your tree has two (or more) paths that gets smashed
> by case insensitive or UTF-normalizing filesystem, and the code we
> would maintain do not have to care how exactly the filesystem
> collapses two (or more) paths if we go that way.  We only need to
> report "we tried to check out X but it seems your filesystem equates
> something else that is also in the project to X".

Heh. See my similar suggestion in:

  https://public-inbox.org/git/20180728095659.GA21450@sigill.intra.peff.net/

and the response from Duy.

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Git clone and case sensitivity
  2018-07-29  9:28               ` Jeff King
  2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
@ 2018-07-31 19:39                 ` Jeff Hostetler
  1 sibling, 0 replies; 96+ messages in thread
From: Jeff Hostetler @ 2018-07-31 19:39 UTC (permalink / raw)
  To: Jeff King, Duy Nguyen
  Cc: brian m. carlson, Paweł Paruzel, Git Mailing List



On 7/29/2018 5:28 AM, Jeff King wrote:
> On Sun, Jul 29, 2018 at 07:26:41AM +0200, Duy Nguyen wrote:
> 
>>> strcasecmp() will only catch a subset of the cases. We really need to
>>> follow the same folding rules that the filesystem would.
>>
>> True. But that's how we handle case insensitivity internally. If a
>> filesytem has more sophisticated folding rules then git will not work
>> well on that one anyway.
> 
> Hrm. Yeah, I guess that's the best we can do for the actual in-memory
> checks. Everything else depends on doing an actual filesystem operation,
> and our icase stuff kicks in way before then. I was mostly thinking of
> HFS+ utf8 normalization weirdness, but I guess people are accustomed to
> that by now.
> 
>>> For the case of clone, I actually wonder if we could detect during the
>>> checkout step that a file already exists. Since we know that the
>>> directory we started with was empty, then if it does, either:
>>>
>>>    - there's some funny case-folding going on that means two paths in the
>>>      repository map to the same name in the filesystem; or
>>>
>>>    - somebody else is writing to the directory at the same time as us
>>
>> This is exactly what my first patch does (minus the sparse checkout
>> part).
> 
> Right, sorry, I should have read that one more carefully.
> 
>> But without knowing the exact folding rules, I don't think we can
>> locate this "somebody else" who wrote the first path. So if N paths
>> are treated the same by this filesystem, we could only report N-1 of
>> them.
>>
>> If we want to report just one path when this happens though, then this
>> works quite well.
> 
> Hmm. Since most such systems are case-preserving, would it be possible
> to report the name of the existing file? Doing it via opendir/readdir is
> hacky, and anyway puts the burden on us to find the matching name. Doing
> it via fstat() on the opened file doesn't work because at that the
> filesystem has resolved the name to an inode.
> 
> So yeah, perhaps strcasecmp() is the best we can do (I do agree that
> being able to mention all of the conflicting names is a benefit).
> 
> I guess we should be using fspathcmp(), though, in case it later learns
> to be smarter.
> 
> -Peff
> 

As has already been mentioned, this gets into weird territory really
fast, between case folding, final space/dot on windows, utf8 NFC/NFD
weirdness on the mac, utf8 invisible chars on the mac, long/short names
on windows, and etc.

And that's just for filenames.  Things really get weird if directory
names have these ambiguities.

Perhaps just print the problematic paths (where the collision is
detected) and let the user decide how to correct them.

Perhaps we could have a separate tool that could scan the index or
commit for potential conflicts and warn them in advance (granted, it
might not be perfect and may report a few false positives).

Forcing them into a sparse-checkout situation might be over their
skill level.

Jeff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 19:29                       ` Jeff King
@ 2018-07-31 20:12                         ` Junio C Hamano
  2018-07-31 20:37                           ` Jeff King
  0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-07-31 20:12 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren, Nguyễn Thái Ngọc Duy,
	Git Mailing List, pawelparuzel95, brian m. carlson

Jeff King <peff@peff.net> writes:

> On Tue, Jul 31, 2018 at 12:12:58PM -0700, Junio C Hamano wrote:
> ...
>> collapses two (or more) paths if we go that way.  We only need to
>> report "we tried to check out X but it seems your filesystem equates
>> something else that is also in the project to X".
>
> Heh. See my similar suggestion in:
>
>   https://public-inbox.org/git/20180728095659.GA21450@sigill.intra.peff.net/
>
> and the response from Duy.

Yes, but is there a reason why we need to report what that
"something else" is?

Presumably we are already in an error codepath, so if it is
absolutely necessary, then we can issue a lstat() to grab the inum
for the path we are about to create, iterate over the previously
checked out paths issuing lstat() and see which one yields the same
inum, to find the one who is the culprit.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 20:12                         ` Junio C Hamano
@ 2018-07-31 20:37                           ` Jeff King
  2018-07-31 20:57                             ` Junio C Hamano
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-07-31 20:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren, Nguyễn Thái Ngọc Duy,
	Git Mailing List, pawelparuzel95, brian m. carlson

On Tue, Jul 31, 2018 at 01:12:14PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Tue, Jul 31, 2018 at 12:12:58PM -0700, Junio C Hamano wrote:
> > ...
> >> collapses two (or more) paths if we go that way.  We only need to
> >> report "we tried to check out X but it seems your filesystem equates
> >> something else that is also in the project to X".
> >
> > Heh. See my similar suggestion in:
> >
> >   https://public-inbox.org/git/20180728095659.GA21450@sigill.intra.peff.net/
> >
> > and the response from Duy.
> 
> Yes, but is there a reason why we need to report what that
> "something else" is?

I don't think it's strictly necessary, but it probably makes things
easier for the user. That said...

> Presumably we are already in an error codepath, so if it is
> absolutely necessary, then we can issue a lstat() to grab the inum
> for the path we are about to create, iterate over the previously
> checked out paths issuing lstat() and see which one yields the same
> inum, to find the one who is the culprit.

Yes, this is the cleverness I was missing in my earlier response.

So it seems do-able, and I like that this incurs no cost in the
non-error case.

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 20:37                           ` Jeff King
@ 2018-07-31 20:57                             ` Junio C Hamano
  2018-08-01 21:20                               ` Junio C Hamano
  0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-07-31 20:57 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren, Nguyễn Thái Ngọc Duy,
	Git Mailing List, pawelparuzel95, brian m. carlson

Jeff King <peff@peff.net> writes:

>> Presumably we are already in an error codepath, so if it is
>> absolutely necessary, then we can issue a lstat() to grab the inum
>> for the path we are about to create, iterate over the previously
>> checked out paths issuing lstat() and see which one yields the same
>> inum, to find the one who is the culprit.
>
> Yes, this is the cleverness I was missing in my earlier response.
>
> So it seems do-able, and I like that this incurs no cost in the
> non-error case.

Not so fast, unfortunately.  

I suspect that some filesystems do not give us inum that we can use
for that "identity" purpose, and they tend to be the ones with the
case smashing characteristics where we need this code in the error
path the most X-<.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 19:13                   ` Junio C Hamano
@ 2018-08-01 15:16                     ` Duy Nguyen
  0 siblings, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-01 15:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Git Mailing List, Paweł Paruzel, brian m. carlson

On Tue, Jul 31, 2018 at 9:13 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
> > Another thing we probably should do is catch in "git checkout" too,
> > not just "git clone" since your linux/unix colleage colleague may
> > accidentally add some files that your mac/windows machine is not very
> > happy with.
>
> Then you would catch it not in checkout but in add, no?

No because the guy who added these may have done it on a
case-sensitive filesystem. Only later when his friend fetches new
changes on a case-insensitive filesytem, the problem becomes real. If
in this scenario, core.ignore is enforced on all machines, then yes we
could catch it at "git add" (and we should already do that or we have
a bug). At least in open source project setting, I think enforcing
core.ignore will not work.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 18:44                   ` Elijah Newren
  2018-07-31 19:12                     ` Junio C Hamano
@ 2018-08-01 15:21                     ` Duy Nguyen
  1 sibling, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-01 15:21 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jeff King, Git Mailing List, Paweł Paruzel, brian m. carlson

On Tue, Jul 31, 2018 at 8:44 PM Elijah Newren <newren@gmail.com> wrote:
> Is it worth attempting to also warn about paths that only differ in
> UTF-normalization on relevant MacOS systems?

Down this thread, Jeff Hostetler drew a scarier picture of "case"
handling on MacOS and Windows. I think we should start with support
things like utf normalization... in core.ignore first before doing the
warning stuff. At that point we know well how to detect and warn, or
any other pitfalls.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 18:23                   ` Torsten Bögershausen
@ 2018-08-01 15:25                     ` Duy Nguyen
  0 siblings, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-01 15:25 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Jeff King, Git Mailing List, Paweł Paruzel, brian m. carlson

On Tue, Jul 31, 2018 at 8:23 PM Torsten Bögershausen <tboegi@web.de> wrote:
> I wonder if we can tell the users more about the "problems"
> and how to avoid them, or to live with them.
>
> This is more loud thinking:
>
> "The following paths only differ in case\n"
> "One a case-insensitive file system only one at a time can be present\n"
> "You may rename one like this:\n"
> "git checkout <file> && git mv <file> <file>.1\n"

Jeff gave a couple more options [1] to fix or workaround this. I think
the problem is there is no single recommended way to deal with it. If
there is, we can describe in this warning. But listing multiple
options in this warning may be too much (the wall of text could easily
take half a screen).

Or if people agree on _one_ suggestion, I will gladly put it in.

[1] https://public-inbox.org/git/CACsJy8A_uZM7nUmyERNHJMya0EyRQYTV7Dp2ikLznxnbOQU6tw@mail.gmail.com/T/#m60fedd7dc928a4d52eb5919811f84556f391a7b3

> > +                             fprintf(stderr, "\t%s\n", dup.items[i].string);
>
> Another question:
> Do we need any quote_path() here ?
> (This may be overkill, since typically the repos with conflicting names
> only use ASCII.)

Would be good to show trailing spaces in path names, so yes.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-07-31 20:57                             ` Junio C Hamano
@ 2018-08-01 21:20                               ` Junio C Hamano
  2018-08-02 14:43                                 ` Duy Nguyen
  0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-01 21:20 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren, Nguyễn Thái Ngọc Duy,
	Git Mailing List, pawelparuzel95, brian m. carlson

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>>> Presumably we are already in an error codepath, so if it is
>>> absolutely necessary, then we can issue a lstat() to grab the inum
>>> for the path we are about to create, iterate over the previously
>>> checked out paths issuing lstat() and see which one yields the same
>>> inum, to find the one who is the culprit.
>>
>> Yes, this is the cleverness I was missing in my earlier response.
>>
>> So it seems do-able, and I like that this incurs no cost in the
>> non-error case.
>
> Not so fast, unfortunately.  
>
> I suspect that some filesystems do not give us inum that we can use
> for that "identity" purpose, and they tend to be the ones with the
> case smashing characteristics where we need this code in the error
> path the most X-<.

But even if inum is unreliable, we should be able to use other
clues, perhaps the same set of fields we use for cached stat
matching optimization we use for "diff" plumbing commands, to
implement the error report.  The more important part of the idea is
that we already need to notice that we need to remove a path that is
in the working tree while doing the checkout, so the alternative
approach won't incur any extra cost for normal cases where the
project being checked out does not have two files whose pathnames
are only different in case (or checking out such an offending
project to a case sensitive filesytem, of course).

So I guess it still _is_ workable.  Any takers?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-01 21:20                               ` Junio C Hamano
@ 2018-08-02 14:43                                 ` Duy Nguyen
  2018-08-02 16:27                                   ` Junio C Hamano
  2018-08-03 14:28                                   ` Torsten Bögershausen
  0 siblings, 2 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-02 14:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Elijah Newren, Git Mailing List, Paweł Paruzel,
	brian m. carlson

On Wed, Aug 1, 2018 at 11:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Jeff King <peff@peff.net> writes:
> >
> >>> Presumably we are already in an error codepath, so if it is
> >>> absolutely necessary, then we can issue a lstat() to grab the inum
> >>> for the path we are about to create, iterate over the previously
> >>> checked out paths issuing lstat() and see which one yields the same
> >>> inum, to find the one who is the culprit.
> >>
> >> Yes, this is the cleverness I was missing in my earlier response.
> >>
> >> So it seems do-able, and I like that this incurs no cost in the
> >> non-error case.
> >
> > Not so fast, unfortunately.
> >
> > I suspect that some filesystems do not give us inum that we can use
> > for that "identity" purpose, and they tend to be the ones with the
> > case smashing characteristics where we need this code in the error
> > path the most X-<.
>
> But even if inum is unreliable, we should be able to use other
> clues, perhaps the same set of fields we use for cached stat
> matching optimization we use for "diff" plumbing commands, to
> implement the error report.  The more important part of the idea is
> that we already need to notice that we need to remove a path that is
> in the working tree while doing the checkout, so the alternative
> approach won't incur any extra cost for normal cases where the
> project being checked out does not have two files whose pathnames
> are only different in case (or checking out such an offending
> project to a case sensitive filesytem, of course).
>
> So I guess it still _is_ workable.  Any takers?

OK so we're going back to the original way of checking that we check
out the different files on the same place (because fs is icase) and
try to collect all paths for reporting, yes? I can give it another go
(but of course if anybody else steps up, I'd very gladly hand this
over)
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-02 14:43                                 ` Duy Nguyen
@ 2018-08-02 16:27                                   ` Junio C Hamano
  2018-08-02 19:06                                     ` Jeff King
  2018-08-03 14:28                                   ` Torsten Bögershausen
  1 sibling, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-02 16:27 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jeff King, Elijah Newren, Git Mailing List, Paweł Paruzel,
	brian m. carlson

Duy Nguyen <pclouds@gmail.com> writes:

>> But even if inum is unreliable, we should be able to use other
>> clues, perhaps the same set of fields we use for cached stat
>> matching optimization we use for "diff" plumbing commands, to
>> implement the error report.  The more important part of the idea is
>> that we already need to notice that we need to remove a path that is
>> in the working tree while doing the checkout, so the alternative
>> approach won't incur any extra cost for normal cases where the
>> project being checked out does not have two files whose pathnames
>> are only different in case (or checking out such an offending
>> project to a case sensitive filesytem, of course).
>>
>> So I guess it still _is_ workable.  Any takers?
>
> OK so we're going back to the original way of checking that we check
> out the different files on the same place (because fs is icase) and
> try to collect all paths for reporting, yes? I can give it another go
> (but of course if anybody else steps up, I'd very gladly hand this
> over)

Detect and report, definitely yes; I am not sure about collect all
(personally I am OK if we stopped at reporting "I tried to check out
X but your project tree has something else that is turned to X by
your pathname-smashing filesystem" without making it a requirement
to report what the other one that conflict with X is.  Of course,
reporting the other side _is_ nicer and I'd be happier if we can do
so without too much ugly code, but I do not think it is a hard
requirement.

Thanks.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-02 16:27                                   ` Junio C Hamano
@ 2018-08-02 19:06                                     ` Jeff King
  2018-08-02 21:14                                       ` Junio C Hamano
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-08-02 19:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel,
	brian m. carlson

On Thu, Aug 02, 2018 at 09:27:31AM -0700, Junio C Hamano wrote:

> > OK so we're going back to the original way of checking that we check
> > out the different files on the same place (because fs is icase) and
> > try to collect all paths for reporting, yes? I can give it another go
> > (but of course if anybody else steps up, I'd very gladly hand this
> > over)
> 
> Detect and report, definitely yes; I am not sure about collect all
> (personally I am OK if we stopped at reporting "I tried to check out
> X but your project tree has something else that is turned to X by
> your pathname-smashing filesystem" without making it a requirement
> to report what the other one that conflict with X is.  Of course,
> reporting the other side _is_ nicer and I'd be happier if we can do
> so without too much ugly code, but I do not think it is a hard
> requirement.

Yeah, I think it would be OK to issue the warning for the conflicted
path, and then if we _can_ produce the secondary list of colliding
paths, do so. Even on a system with working inodes, we may not come up
with a match (e.g., if it wasn't us who wrote the file, but rather we
raced with some other process).

I also wonder if Windows could return some other file-unique identifier
that would work in place of an inode here. That would be pretty easy to
swap in via an #ifdef's helper function. I'd be OK shipping without that
and letting Windows folks fill it in later (as long as we do not do
anything too stupid until then, like claim all of the inode==0 files are
the same).

-Peff

PS It occurs to me that doing this naively (re-scan the entries already
   checked out when we see a collision) ends up quadratic over the
   number of entries in the worst case. That may not matter. You'd only
   have a handful of collisions normally, and anybody malicious can
   already git-bomb your checkout anyway. If we care, an alternative
   would be to set a flag for "I saw some collisions", and then follow
   up with a single pass putting entries into a hashmap of
   inode->filename.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-02 19:06                                     ` Jeff King
@ 2018-08-02 21:14                                       ` Junio C Hamano
  2018-08-02 21:28                                         ` Jeff King
  0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-02 21:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel,
	brian m. carlson

Jeff King <peff@peff.net> writes:

> I also wonder if Windows could return some other file-unique identifier
> that would work in place of an inode here. That would be pretty easy to
> swap in via an #ifdef's helper function. I'd be OK shipping without that
> and letting Windows folks fill it in later (as long as we do not do
> anything too stupid until then, like claim all of the inode==0 files are
> the same).

Yeah, but such a useful file-unique identifier would probably be
used in place of inum in their (l)stat emulation already, if exists,
no?

> PS It occurs to me that doing this naively (re-scan the entries already
>    checked out when we see a collision) ends up quadratic over the
>    number of entries in the worst case. That may not matter. You'd only
>    have a handful of collisions normally, and anybody malicious can
>    already git-bomb your checkout anyway. If we care, an alternative
>    would be to set a flag for "I saw some collisions", and then follow
>    up with a single pass putting entries into a hashmap of
>    inode->filename.

Yeah, that makes sense.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-02 21:14                                       ` Junio C Hamano
@ 2018-08-02 21:28                                         ` Jeff King
  2018-08-03 18:23                                           ` Jeff Hostetler
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-08-02 21:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel,
	brian m. carlson

On Thu, Aug 02, 2018 at 02:14:30PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I also wonder if Windows could return some other file-unique identifier
> > that would work in place of an inode here. That would be pretty easy to
> > swap in via an #ifdef's helper function. I'd be OK shipping without that
> > and letting Windows folks fill it in later (as long as we do not do
> > anything too stupid until then, like claim all of the inode==0 files are
> > the same).
> 
> Yeah, but such a useful file-unique identifier would probably be
> used in place of inum in their (l)stat emulation already, if exists,
> no?

Maybe. It might not work as ino_t. Or it might be expensive to get.  Or
maybe it's simply impossible. I don't know much about Windows. Some
searching implies that NTFS does have a "file index" concept which is
supposed to be unique.

At any rate, until we have an actual plan for Windows, I think it would
make sense only to split the cases into "has working inodes" and
"other", and make sure "other" does something sensible in the meantime
(like mention the conflict, but skip trying to list duplicates).

When somebody wants to work on Windows support, then we can figure out
if it just needs to wrap the "get unique identifier" operation, or if it
would use a totally different algorithm.

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-02 14:43                                 ` Duy Nguyen
  2018-08-02 16:27                                   ` Junio C Hamano
@ 2018-08-03 14:28                                   ` Torsten Bögershausen
  1 sibling, 0 replies; 96+ messages in thread
From: Torsten Bögershausen @ 2018-08-03 14:28 UTC (permalink / raw)
  To: Duy Nguyen, Junio C Hamano
  Cc: Jeff King, Elijah Newren, Git Mailing List, Paweł Paruzel,
	brian m. carlson

On 2018-08-02 15:43, Duy Nguyen wrote:
> On Wed, Aug 1, 2018 at 11:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Jeff King <peff@peff.net> writes:
>>>
>>>>> Presumably we are already in an error codepath, so if it is
>>>>> absolutely necessary, then we can issue a lstat() to grab the inum
>>>>> for the path we are about to create, iterate over the previously
>>>>> checked out paths issuing lstat() and see which one yields the same
>>>>> inum, to find the one who is the culprit.
>>>>
>>>> Yes, this is the cleverness I was missing in my earlier response.
>>>>
>>>> So it seems do-able, and I like that this incurs no cost in the
>>>> non-error case.
>>>
>>> Not so fast, unfortunately.
>>>
>>> I suspect that some filesystems do not give us inum that we can use
>>> for that "identity" purpose, and they tend to be the ones with the
>>> case smashing characteristics where we need this code in the error
>>> path the most X-<.
>>
>> But even if inum is unreliable, we should be able to use other
>> clues, perhaps the same set of fields we use for cached stat
>> matching optimization we use for "diff" plumbing commands, to
>> implement the error report.  The more important part of the idea is
>> that we already need to notice that we need to remove a path that is
>> in the working tree while doing the checkout, so the alternative
>> approach won't incur any extra cost for normal cases where the
>> project being checked out does not have two files whose pathnames
>> are only different in case (or checking out such an offending
>> project to a case sensitive filesytem, of course).
>>
>> So I guess it still _is_ workable.  Any takers?
> 
> OK so we're going back to the original way of checking that we check
> out the different files on the same place (because fs is icase) and
> try to collect all paths for reporting, yes?

I would say: Yes.

> I can give it another go
> (but of course if anybody else steps up, I'd very gladly hand this
> over)
> 

Not at the moment.



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-02 21:28                                         ` Jeff King
@ 2018-08-03 18:23                                           ` Jeff Hostetler
  2018-08-03 18:49                                             ` Junio C Hamano
  2018-08-03 18:53                                             ` Jeff King
  0 siblings, 2 replies; 96+ messages in thread
From: Jeff Hostetler @ 2018-08-03 18:23 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano
  Cc: Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel,
	brian m. carlson



On 8/2/2018 5:28 PM, Jeff King wrote:
> On Thu, Aug 02, 2018 at 02:14:30PM -0700, Junio C Hamano wrote:
> 
>> Jeff King <peff@peff.net> writes:
>>
>>> I also wonder if Windows could return some other file-unique identifier
>>> that would work in place of an inode here. That would be pretty easy to
>>> swap in via an #ifdef's helper function. I'd be OK shipping without that
>>> and letting Windows folks fill it in later (as long as we do not do
>>> anything too stupid until then, like claim all of the inode==0 files are
>>> the same).
>>
>> Yeah, but such a useful file-unique identifier would probably be
>> used in place of inum in their (l)stat emulation already, if exists,
>> no?
> 
> Maybe. It might not work as ino_t. Or it might be expensive to get.  Or
> maybe it's simply impossible. I don't know much about Windows. Some
> searching implies that NTFS does have a "file index" concept which is
> supposed to be unique.

This is hard and/or expensive on Windows.  Yes, you can get the
"file index" values for an open file handle with a cost similar to
an fstat().  Unfortunately, the FindFirst/FindNext routines (equivalent
to the opendir/readdir routines), don't give you that data.  So we'd
have to scan the directory and then open and stat each file.  This is
terribly expensive on Windows -- and the reason we have the fscache
layer (in the GfW version) to intercept the lstat() calls whenever
possible.

It might be possible to use the NTFS Master File Table to discover
this (very big handwave), but I would need to do a little digging.

This would all be NTFS specific.  FAT and other volume types would not
be covered.

Another thing to keep in mind is that the collision could be because
of case folding (or other such nonsense) on a directory in the path.
I mean, if someone on Linux builds a commit containing:

     a/b/c/D/e/foo.txt
     a/b/c/d/e/foo.txt

we'll get a similar collision as if one of them were spelled "FOO.txt".

Also, do we need to worry about hard-links or symlinks here?
If checkout populates symlinks, then you might have another collision
opportunity.  For example:

     a/b/c/D/e/foo.txt
     a/link -> ./b/c/d
     a/link/e/foo.txt

Also, some platforms (like the Mac) allow directory hard-links.
Granted, Git doesn't create hard-links during checkout, but the
user might.

I'm sure there are other edge cases here that make reporting
difficult; these are just a few I thought of.  I guess what I'm
trying to say is that as a first step just report that you found
a collision -- without trying to identify the set existing objects
that it collided with.

> 
> At any rate, until we have an actual plan for Windows, I think it would
> make sense only to split the cases into "has working inodes" and
> "other", and make sure "other" does something sensible in the meantime
> (like mention the conflict, but skip trying to list duplicates).

Yes, this should be split.  Do the "easy" Linux version first.
Keep in mind that there may also be a different solution for the Mac.

> When somebody wants to work on Windows support, then we can figure out
> if it just needs to wrap the "get unique identifier" operation, or if it
> would use a totally different algorithm.
> 
> -Peff
> 

Jeff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-03 18:23                                           ` Jeff Hostetler
@ 2018-08-03 18:49                                             ` Junio C Hamano
  2018-08-03 18:53                                             ` Jeff King
  1 sibling, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2018-08-03 18:49 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Jeff King, Duy Nguyen, Elijah Newren, Git Mailing List,
	Paweł Paruzel, brian m. carlson

Jeff Hostetler <git@jeffhostetler.com> writes:

> Another thing to keep in mind is that the collision could be because
> of case folding (or other such nonsense) on a directory in the path.
> I mean, if someone on Linux builds a commit containing:
>
>     a/b/c/D/e/foo.txt
>     a/b/c/d/e/foo.txt
>
> we'll get a similar collision as if one of them were spelled "FOO.txt".

I'd think the approach to teach checkout_entry() codepath to notice
it needed to unlink the existing file in order to check out the
entry it wanted to check out would cover this equally well.

> Also, do we need to worry about hard-links or symlinks here?

I do not think so.  You do not get a file with multiple hardlinks
in a "git clone" or "git checkout" result, and we do not check
things out beyond a symbolic link in the first place.

> If checkout populates symlinks, then you might have another collision
> opportunity.  For example:
>
>     a/b/c/D/e/foo.txt
>     a/link -> ./b/c/d
>     a/link/e/foo.txt

In other words, a tree with a/link (symlink) and a/link/<anything>
that requires a/link to be a symlink and a directory at the same
time cannot be created, so you won't get one with "git clone"

> Also, some platforms (like the Mac) allow directory hard-links.
> Granted, Git doesn't create hard-links during checkout, but the
> user might.

And we'd report "we are doing a fresh checkout immediately after a
clone and saw some file we haven't created, which may indicate a
case smashing filesystem glitch (or a competing third-party process
creating random files)", so noticing that would be a good thing, I
would think.

> I'm sure there are other edge cases here that make reporting
> difficult; these are just a few I thought of.  I guess what I'm
> trying to say is that as a first step just report that you found
> a collision -- without trying to identify the set existing objects
> that it collided with.

Yup, I think that is sensible.  If it can be done cheaply, i.e. on a
filesystem with trustable and cheap inum, after noticing such a
collision, go back and lstat() all paths in the index we have
checked out so far to see which ones are colliding, it adds useful
clue to the report, but noticing the collision in the first place
obviously has more value.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-03 18:23                                           ` Jeff Hostetler
  2018-08-03 18:49                                             ` Junio C Hamano
@ 2018-08-03 18:53                                             ` Jeff King
  2018-08-05 14:01                                               ` Jeff Hostetler
  1 sibling, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-08-03 18:53 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Duy Nguyen, Elijah Newren, Git Mailing List,
	Paweł Paruzel, brian m. carlson

On Fri, Aug 03, 2018 at 02:23:17PM -0400, Jeff Hostetler wrote:

> > Maybe. It might not work as ino_t. Or it might be expensive to get.  Or
> > maybe it's simply impossible. I don't know much about Windows. Some
> > searching implies that NTFS does have a "file index" concept which is
> > supposed to be unique.
> 
> This is hard and/or expensive on Windows.  Yes, you can get the
> "file index" values for an open file handle with a cost similar to
> an fstat().  Unfortunately, the FindFirst/FindNext routines (equivalent
> to the opendir/readdir routines), don't give you that data.  So we'd
> have to scan the directory and then open and stat each file.  This is
> terribly expensive on Windows -- and the reason we have the fscache
> layer (in the GfW version) to intercept the lstat() calls whenever
> possible.

I think that high cost might be OK for our purposes here. This code
would _only_ kick in during a clone, and then only on the error path
once we knew we had a collision during the checkout step.

> Another thing to keep in mind is that the collision could be because
> of case folding (or other such nonsense) on a directory in the path.
> I mean, if someone on Linux builds a commit containing:
> 
>     a/b/c/D/e/foo.txt
>     a/b/c/d/e/foo.txt
> 
> we'll get a similar collision as if one of them were spelled "FOO.txt".

True, though I think that may be OK. If you had conflicting directories
you'd get a _ton_ of duplicates listed, but that makes sense: you
actually have a ton of duplicates.

> Also, do we need to worry about hard-links or symlinks here?

I think we can ignore hardlinks. Git never creates them, and we know the
directory was empty when we started. Symlinks should be handled by using
lstat(). (Obviously that's for a Unix-ish platform).

> I'm sure there are other edge cases here that make reporting
> difficult; these are just a few I thought of.  I guess what I'm
> trying to say is that as a first step just report that you found
> a collision -- without trying to identify the set existing objects
> that it collided with.

I certainly don't disagree with that. :)

> > At any rate, until we have an actual plan for Windows, I think it would
> > make sense only to split the cases into "has working inodes" and
> > "other", and make sure "other" does something sensible in the meantime
> > (like mention the conflict, but skip trying to list duplicates).
> 
> Yes, this should be split.  Do the "easy" Linux version first.
> Keep in mind that there may also be a different solution for the Mac.

I assumed that an inode-based solution would work for Mac, since it's
mostly BSD under the hood. There may be subtleties I don't know about,
though.

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
  2018-08-03 18:53                                             ` Jeff King
@ 2018-08-05 14:01                                               ` Jeff Hostetler
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Hostetler @ 2018-08-05 14:01 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Duy Nguyen, Elijah Newren, Git Mailing List,
	Paweł Paruzel, brian m. carlson



On 8/3/2018 2:53 PM, Jeff King wrote:
> On Fri, Aug 03, 2018 at 02:23:17PM -0400, Jeff Hostetler wrote:
> 
>>> Maybe. It might not work as ino_t. Or it might be expensive to get.  Or
>>> maybe it's simply impossible. I don't know much about Windows. Some
>>> searching implies that NTFS does have a "file index" concept which is
>>> supposed to be unique.
>>
>> This is hard and/or expensive on Windows.  Yes, you can get the
>> "file index" values for an open file handle with a cost similar to
>> an fstat().  Unfortunately, the FindFirst/FindNext routines (equivalent
>> to the opendir/readdir routines), don't give you that data.  So we'd
>> have to scan the directory and then open and stat each file.  This is
>> terribly expensive on Windows -- and the reason we have the fscache
>> layer (in the GfW version) to intercept the lstat() calls whenever
>> possible.
> 
> I think that high cost might be OK for our purposes here. This code
> would _only_ kick in during a clone, and then only on the error path
> once we knew we had a collision during the checkout step.
> 

Good point.

I've confirmed that the "file index" values can be used to determine
whether 2 path names are equivalent under NTFS for case variation,
final-dot/space, and short-names vs long-names.

I ran out of time this morning to search the directory for equivalent 
paths.  I'll look at that shortly.

Jeff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
                                     ` (2 preceding siblings ...)
  2018-07-31 19:13                   ` Junio C Hamano
@ 2018-08-07 19:01                   ` Nguyễn Thái Ngọc Duy
  2018-08-07 19:31                     ` Junio C Hamano
  2018-08-10 15:36                     ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy
  3 siblings, 2 replies; 96+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-07 19:01 UTC (permalink / raw)
  To: pclouds
  Cc: git, pawelparuzel95, peff, sandals, Elijah Newren, tboegi,
	Junio C Hamano, git

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

I do not make any suggestions to fix or workaround the problem because
there are many different options, especially when the problem comes
from folding rules other than case (e.g. UTF-8 normalization, Windows
special paths...)

In the previous iteration, inode has been suggested to find the
matching entry. But it is platform specific, and because we already
have a common function for matching stat, the function is used here
even if it's more expensive. Bonus point is we don't need some "#ifdef
platform" around this code.

The cost goes higher when we find duplicated entries at the bottom of
the index, but the number of these entries should be very small that
total extra cost should not be really noticeable.

This patch is tested with vim-colorschemes repository on a JFS partion
with case insensitive support on Linux. This repository has two files
darkBlue.vim and darkblue.vim.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 v2 has completely different approach so no point in sending
 interdiff.

 One nice thing about this is we don't need platform specific code for
 detecting the duplicate entries. I think ce_match_stat() works even
 on Windows. And it's now equally expensive on all platforms :D

 builtin/clone.c |  1 +
 cache.h         |  2 ++
 entry.c         | 44 ++++++++++++++++++++++++++++++++++++++++++++
 unpack-trees.c  | 23 +++++++++++++++++++++++
 unpack-trees.h  |  1 +
 5 files changed, 71 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9ebb5acf56..38d5609282 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -748,6 +748,7 @@ static int checkout(int submodule_progress)
 	memset(&opts, 0, sizeof opts);
 	opts.update = 1;
 	opts.merge = 1;
+	opts.clone = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/cache.h b/cache.h
index 8dc7134f00..cdf0984707 100644
--- a/cache.h
+++ b/cache.h
@@ -1515,9 +1515,11 @@ struct checkout {
 	const char *base_dir;
 	int base_dir_len;
 	struct delayed_checkout *delayed_checkout;
+	int *nr_duplicates;
 	unsigned force:1,
 		 quiet:1,
 		 not_new:1,
+		 clone:1,
 		 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..3917bfc874 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,47 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen)
 	return lstat(path, st);
 }
 
+static void mark_duplicate_entries(const struct checkout *state,
+				   struct cache_entry *ce, struct stat *st)
+{
+	int i;
+	int *count = state->nr_duplicates;
+
+	if (!count)
+		BUG("state->nr_duplicates must not be NULL");
+
+	ce->ce_flags |= CE_MATCHED;
+	(*count)++;
+
+	if (!state->refresh_cache)
+		BUG("We need this to narrow down the set of updated entries");
+
+	for (i = 0; i < state->istate->cache_nr; i++) {
+		struct cache_entry *dup = state->istate->cache[i];
+
+		/*
+		 * This entry has not been checked out yet, otherwise
+		 * its stat info must have been updated. And since we
+		 * check out from top to bottom, the rest is guaranteed
+		 * not checked out. Stop now.
+		 */
+		if (!ce_uptodate(dup))
+			break;
+
+		if (dup->ce_flags & CE_MATCHED)
+			continue;
+
+		if (ce_match_stat(dup, st,
+				  CE_MATCH_IGNORE_VALID |
+				  CE_MATCH_IGNORE_SKIP_WORKTREE))
+			continue;
+
+		dup->ce_flags |= CE_MATCHED;
+		(*count)++;
+		break;
+	}
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +496,9 @@ int checkout_entry(struct cache_entry *ce,
 			return -1;
 		}
 
+		if (state->clone)
+			mark_duplicate_entries(state, ce, &st);
+
 		/*
 		 * We unlink the old file, to get the new one with the
 		 * right permissions (including umask, which is nasty
diff --git a/unpack-trees.c b/unpack-trees.c
index f9efee0836..1b0c11142a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -344,12 +344,20 @@ static int check_updates(struct unpack_trees_options *o)
 	struct index_state *index = &o->result;
 	struct checkout state = CHECKOUT_INIT;
 	int i;
+	int nr_duplicates = 0;
 
 	state.force = 1;
 	state.quiet = 1;
 	state.refresh_cache = 1;
 	state.istate = index;
 
+	if (o->clone) {
+		state.clone = 1;
+		state.nr_duplicates = &nr_duplicates;
+		for (i = 0; i < index->cache_nr; i++)
+			index->cache[i]->ce_flags &= ~CE_MATCHED;
+	}
+
 	progress = get_progress(o);
 
 	if (o->update)
@@ -414,6 +422,21 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+
+	if (o->clone && state.nr_duplicates) {
+		warning(_("the following paths in this repository only differ in case\n"
+			  "from another path and will cause problems because you have cloned\n"
+			  "it on an case-insensitive filesytem:\n"));
+		for (i = 0; i < index->cache_nr; i++) {
+			struct cache_entry *ce = index->cache[i];
+
+			if (!(ce->ce_flags & CE_MATCHED))
+				continue;
+			fprintf(stderr, "  '%s'\n", ce->name);
+			ce->ce_flags &= ~CE_MATCHED;
+		}
+	}
+
 	return errs != 0;
 }
 
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..d940f1c5c2 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -42,6 +42,7 @@ struct unpack_trees_options {
 	unsigned int reset,
 		     merge,
 		     update,
+		     clone,
 		     index_only,
 		     nontrivial_merge,
 		     trivial_merges_only,
-- 
2.18.0.915.gd571298aae


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-07 19:01                   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
@ 2018-08-07 19:31                     ` Junio C Hamano
  2018-08-08 19:48                       ` Jeff Hostetler
  2018-08-10 15:36                     ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-07 19:31 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, pawelparuzel95, peff, sandals, Elijah Newren, tboegi, git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

>  One nice thing about this is we don't need platform specific code for
>  detecting the duplicate entries. I think ce_match_stat() works even
>  on Windows. And it's now equally expensive on all platforms :D

ce_match_stat() may not be a very good measure to see if two paths
refer to the same file, though.  After a fresh checkout, I would not
be surprised if two completely unrelated paths have the same size
and have same mtime/ctime.  In its original use case, i.e. "I have
one specific path in mind and took a snapshot of its metadata
earlier.  Is it still the same, or has it been touched?", that may
be sufficient to detect that the path has been _modified_, but
without reliable inum, it may be a poor measure to say two paths
refer to the same.

>  builtin/clone.c |  1 +
>  cache.h         |  2 ++
>  entry.c         | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  unpack-trees.c  | 23 +++++++++++++++++++++++
>  unpack-trees.h  |  1 +
>  5 files changed, 71 insertions(+)

Having said that, it is pleasing to see that this can be achieved
with so little additional code.

> +static void mark_duplicate_entries(const struct checkout *state,
> +				   struct cache_entry *ce, struct stat *st)
> +{
> +	int i;
> +	int *count = state->nr_duplicates;
> +
> +	if (!count)
> +		BUG("state->nr_duplicates must not be NULL");
> +
> +	ce->ce_flags |= CE_MATCHED;
> +	(*count)++;
> +
> +	if (!state->refresh_cache)
> +		BUG("We need this to narrow down the set of updated entries");
> +
> +	for (i = 0; i < state->istate->cache_nr; i++) {
> +		struct cache_entry *dup = state->istate->cache[i];
> +
> +		/*
> +		 * This entry has not been checked out yet, otherwise
> +		 * its stat info must have been updated. And since we
> +		 * check out from top to bottom, the rest is guaranteed
> +		 * not checked out. Stop now.
> +		 */
> +		if (!ce_uptodate(dup))
> +			break;
> +
> +		if (dup->ce_flags & CE_MATCHED)
> +			continue;
> +
> +		if (ce_match_stat(dup, st,
> +				  CE_MATCH_IGNORE_VALID |
> +				  CE_MATCH_IGNORE_SKIP_WORKTREE))
> +			continue;
> +
> +		dup->ce_flags |= CE_MATCHED;
> +		(*count)++;
> +		break;
> +	}
> +}
> +

Hmph.  If there is only one true collision, then all its aliases
will be marked with CE_MATCHED bit every time the second and the
subsequent alias is checked out (as the caller calls this function
when it noticed that something already is at the path ce wants to
go).  But if there are two sets of colliding paths, because there is
only one bit used, we do not group the paths into these two sets and
report, e.g. "blue.txt, BLUE.txt and BLUE.TXT collide.  red.txt and
RED.txt also collide."  I am not sure if computing that is too much
work for too little gain, but because this is in an error codepath,
it may be worth doing.  I dunno.

> +
> +	if (o->clone && state.nr_duplicates) {
> +		warning(_("the following paths in this repository only differ in case\n"
> +			  "from another path and will cause problems because you have cloned\n"
> +			  "it on an case-insensitive filesytem:\n"));

With the new approach, we no longer preemptively detect that the
project will be harmed by a case smashing filesystems before it
happens.  This instead reports that the project has already been
harmed on _this_ system by such a filesystem after the fact.

So from the end-user's point of view, "will cause problems" may be a
message that came a bit too late.  "have collided and only one from
the same colliding group is in the working tree; others failed to be
checked out" is probably closer to the truth.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-07 19:31                     ` Junio C Hamano
@ 2018-08-08 19:48                       ` Jeff Hostetler
  2018-08-08 22:31                         ` Jeff King
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff Hostetler @ 2018-08-08 19:48 UTC (permalink / raw)
  To: Junio C Hamano, Nguyễn Thái Ngọc Duy
  Cc: git, pawelparuzel95, peff, sandals, Elijah Newren, tboegi



On 8/7/2018 3:31 PM, Junio C Hamano wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
> 
>>   One nice thing about this is we don't need platform specific code for
>>   detecting the duplicate entries. I think ce_match_stat() works even
>>   on Windows. And it's now equally expensive on all platforms :D
> 
> ce_match_stat() may not be a very good measure to see if two paths
> refer to the same file, though.  After a fresh checkout, I would not
> be surprised if two completely unrelated paths have the same size
> and have same mtime/ctime.  In its original use case, i.e. "I have
> one specific path in mind and took a snapshot of its metadata
> earlier.  Is it still the same, or has it been touched?", that may
> be sufficient to detect that the path has been _modified_, but
> without reliable inum, it may be a poor measure to say two paths
> refer to the same.

I agree with Junio on this one.  The mtime values are sloppy at best.
On FAT file systems, they have 2 second resolution.  Even NTFS IIRC
has only 100ns resolution, so we might get a lot of false matches
using this technique, right?

It might be better to build an equivalence-class hash-map for the
colliding entries.  Compute a "normalized" version of the pathname
(such as convert to lowercase, strip final-dots/spaces, strip the
digits following tilda of a shortname, and etc for the MAC's UTF-isms).
Then when you rescan the index entries to find the matches, apply the
equivalence operator on the pathname and do the hashmap lookup.
When you find a match, you have a "potential" collider pair (I say
potential only because of the ambiguity of shortnames).  Then we
can use inum/file-index/whatever to see if they actually collide.


Jeff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-08 19:48                       ` Jeff Hostetler
@ 2018-08-08 22:31                         ` Jeff King
  2018-08-09  0:41                           ` Junio C Hamano
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-08-08 22:31 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, git,
	pawelparuzel95, sandals, Elijah Newren, tboegi

On Wed, Aug 08, 2018 at 03:48:04PM -0400, Jeff Hostetler wrote:

> > ce_match_stat() may not be a very good measure to see if two paths
> > refer to the same file, though.  After a fresh checkout, I would not
> > be surprised if two completely unrelated paths have the same size
> > and have same mtime/ctime.  In its original use case, i.e. "I have
> > one specific path in mind and took a snapshot of its metadata
> > earlier.  Is it still the same, or has it been touched?", that may
> > be sufficient to detect that the path has been _modified_, but
> > without reliable inum, it may be a poor measure to say two paths
> > refer to the same.
> 
> I agree with Junio on this one.  The mtime values are sloppy at best.
> On FAT file systems, they have 2 second resolution.  Even NTFS IIRC
> has only 100ns resolution, so we might get a lot of false matches
> using this technique, right?

Yeah, I think anything less than inode (or some system equivalent) is
going to be too flaky.

> It might be better to build an equivalence-class hash-map for the
> colliding entries.  Compute a "normalized" version of the pathname
> (such as convert to lowercase, strip final-dots/spaces, strip the
> digits following tilda of a shortname, and etc for the MAC's UTF-isms).
> Then when you rescan the index entries to find the matches, apply the
> equivalence operator on the pathname and do the hashmap lookup.
> When you find a match, you have a "potential" collider pair (I say
> potential only because of the ambiguity of shortnames).  Then we
> can use inum/file-index/whatever to see if they actually collide.

I think we really want to avoid doing that normalization ourselves if we
can. There are just too many filesystem-specific rules.

If we have an equivalence-class hashmap and feed it inodes (or again,
some system equivalent) as the keys, we should get buckets of
collisions. I started to write a "something like this..." earlier, but
got bogged down in boilerplate around the C hashmap.

But here it is in perl. ;)

-- >8 --
# pretend we have these paths in our index
paths='foo FOO and some other paths'

# create them; this will make a single path on a case-insensitive system
for i in $paths; do
  echo $i >$i
done

# now find the duplicates
perl -le '
  for my $path (@ARGV) {
    # this would be an ntfs unique-id on Windows
    my $inode = (lstat($path))[1];
    push @{$h{$inode}}, $path;
  }

  for my $group (grep { @$_ > 1 } values(%h)) {
    print "group:";
    print "  ", $_ for (@$group);
  }
' $paths
-- >8 --

which should show the obvious pair (it does for me on vfat-on-linux,
though where it gets those inodes from, I have no idea ;) ).

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-08 22:31                         ` Jeff King
@ 2018-08-09  0:41                           ` Junio C Hamano
  2018-08-09 14:23                             ` Jeff King
  0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-09  0:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Jeff Hostetler, Nguyễn Thái Ngọc Duy, git,
	pawelparuzel95, sandals, Elijah Newren, tboegi

Jeff King <peff@peff.net> writes:

> I think we really want to avoid doing that normalization ourselves if we
> can. There are just too many filesystem-specific rules.

Exactly; not having to learn these rules is the major (if not whole)
point of the "let checkout notice the collision and then deal with
it" approach.  Let's not forget that.

> If we have an equivalence-class hashmap and feed it inodes (or again,
> some system equivalent) as the keys, we should get buckets of
> collisions.

I guess one way to get "some system equivalent" that can be used as
the last resort, when there absolutely is no inum equivalent, is to
rehash the working tree file that shouldn't be there when we detect
a collision.

If we found that there is something when we tried to write out
"Foo.txt", if we open "Foo.txt" on the working tree and hash-object
it, we should find the matching blob somewhere in the index _before_
"Foo.txt".  On a case-insensitive filesytem, it may well be
"foo.txt", but we do not even have to know "foo.txt" and "Foo.txt"
only differ in case.

Of course, that's really the last resort, as it would be costly, but
this is something that only need to happen on the "unusual" case in
the error codepath, so...

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09  0:41                           ` Junio C Hamano
@ 2018-08-09 14:23                             ` Jeff King
  2018-08-09 21:14                               ` Jeff Hostetler
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-08-09 14:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff Hostetler, Nguyễn Thái Ngọc Duy, git,
	pawelparuzel95, sandals, Elijah Newren, tboegi

On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote:

> > If we have an equivalence-class hashmap and feed it inodes (or again,
> > some system equivalent) as the keys, we should get buckets of
> > collisions.
> 
> I guess one way to get "some system equivalent" that can be used as
> the last resort, when there absolutely is no inum equivalent, is to
> rehash the working tree file that shouldn't be there when we detect
> a collision.
> 
> If we found that there is something when we tried to write out
> "Foo.txt", if we open "Foo.txt" on the working tree and hash-object
> it, we should find the matching blob somewhere in the index _before_
> "Foo.txt".  On a case-insensitive filesytem, it may well be
> "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt"
> only differ in case.

Clever. You might still run into false positives when there is
duplicated content in the repository (especially, say, zero-length
files).  But the fact that you only do the hashing on known duplicates
helps with that.

One of the things I did like about the equivalence-class approach is
that it can be done in a single linear pass in the worst case. Whereas
anything that searches when we see a collision is quite likely to be
quadratic. But as I said before, it may not be worth worrying too much
about that for an error code path where we expect the number of
collisions to be small.

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09 14:23                             ` Jeff King
@ 2018-08-09 21:14                               ` Jeff Hostetler
  2018-08-09 21:34                                 ` Jeff King
  2018-08-09 21:40                                 ` Elijah Newren
  0 siblings, 2 replies; 96+ messages in thread
From: Jeff Hostetler @ 2018-08-09 21:14 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, git, pawelparuzel95,
	sandals, Elijah Newren, tboegi



On 8/9/2018 10:23 AM, Jeff King wrote:
> On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote:
> 
>>> If we have an equivalence-class hashmap and feed it inodes (or again,
>>> some system equivalent) as the keys, we should get buckets of
>>> collisions.
>>
>> I guess one way to get "some system equivalent" that can be used as
>> the last resort, when there absolutely is no inum equivalent, is to
>> rehash the working tree file that shouldn't be there when we detect
>> a collision.
>>
>> If we found that there is something when we tried to write out
>> "Foo.txt", if we open "Foo.txt" on the working tree and hash-object
>> it, we should find the matching blob somewhere in the index _before_
>> "Foo.txt".  On a case-insensitive filesytem, it may well be
>> "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt"
>> only differ in case.
> 
> Clever. You might still run into false positives when there is
> duplicated content in the repository (especially, say, zero-length
> files).  But the fact that you only do the hashing on known duplicates
> helps with that.
> 

I worry that the false positives make this a non-starter.  I mean, if
clone creates files 'A' and 'B' (both equal) and then tries to create
'b', would the collision code reports that 'b' collided with 'A' because
that was the first OID match?  Ideally with this scheme we'd have to
search the entire index prior to 'b' and then report that 'b' collided
with either 'A' or 'B'.  Neither message instills confidence.  And
there's no way to prefer answer 'B' over 'A' without using knowledge
of the FS name mangling/aliasing rules -- unless we want to just assume
ignore-case for this iteration.

> One of the things I did like about the equivalence-class approach is
> that it can be done in a single linear pass in the worst case. Whereas
> anything that searches when we see a collision is quite likely to be
> quadratic. But as I said before, it may not be worth worrying too much
> about that for an error code path where we expect the number of
> collisions to be small.
> 
> -Peff
> 

Sorry to be paranoid, but I have an index with 3.5M entries, the word
"quadratic" rings all kinds of alarm bells for me.  :-)

Granted, we expect the number of collisions to be small, but searching
back for each collision over the already-populated portion of the index
could be expensive.

Jeff


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09 21:14                               ` Jeff Hostetler
@ 2018-08-09 21:34                                 ` Jeff King
  2018-08-09 21:40                                 ` Elijah Newren
  1 sibling, 0 replies; 96+ messages in thread
From: Jeff King @ 2018-08-09 21:34 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, git,
	pawelparuzel95, sandals, Elijah Newren, tboegi

On Thu, Aug 09, 2018 at 05:14:16PM -0400, Jeff Hostetler wrote:

> > Clever. You might still run into false positives when there is
> > duplicated content in the repository (especially, say, zero-length
> > files).  But the fact that you only do the hashing on known duplicates
> > helps with that.
> > 
> 
> I worry that the false positives make this a non-starter.  I mean, if
> clone creates files 'A' and 'B' (both equal) and then tries to create
> 'b', would the collision code reports that 'b' collided with 'A' because
> that was the first OID match?  Ideally with this scheme we'd have to
> search the entire index prior to 'b' and then report that 'b' collided
> with either 'A' or 'B'.  Neither message instills confidence.  And
> there's no way to prefer answer 'B' over 'A' without using knowledge
> of the FS name mangling/aliasing rules -- unless we want to just assume
> ignore-case for this iteration.

Yeah. If we can get usable unique ids (inode or otherwise) of some form
on each system, I think I prefer that. It's much easier to reason about.

> Sorry to be paranoid, but I have an index with 3.5M entries, the word
> "quadratic" rings all kinds of alarm bells for me.  :-)
> 
> Granted, we expect the number of collisions to be small, but searching
> back for each collision over the already-populated portion of the index
> could be expensive.

Heh. Yeah, it's really O(n*m), and we expect a small "m". But of course
the two are equal in the worst case.

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09 21:14                               ` Jeff Hostetler
  2018-08-09 21:34                                 ` Jeff King
@ 2018-08-09 21:40                                 ` Elijah Newren
  2018-08-09 21:44                                   ` Jeff King
  2018-08-09 22:07                                   ` Junio C Hamano
  1 sibling, 2 replies; 96+ messages in thread
From: Elijah Newren @ 2018-08-09 21:40 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Jeff King, Junio C Hamano, Nguyễn Thái Ngọc,
	Git Mailing List, pawelparuzel95, brian m. carlson,
	Torsten Bögershausen

On Thu, Aug 9, 2018 at 2:14 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
> On 8/9/2018 10:23 AM, Jeff King wrote:
> > On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote:
> >> If we found that there is something when we tried to write out
> >> "Foo.txt", if we open "Foo.txt" on the working tree and hash-object
> >> it, we should find the matching blob somewhere in the index _before_
> >> "Foo.txt".  On a case-insensitive filesytem, it may well be
> >> "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt"
> >> only differ in case.
> >
> > Clever. You might still run into false positives when there is
> > duplicated content in the repository (especially, say, zero-length
> > files).  But the fact that you only do the hashing on known duplicates
> > helps with that.
>
> I worry that the false positives make this a non-starter.  I mean, if
> clone creates files 'A' and 'B' (both equal) and then tries to create
> 'b', would the collision code reports that 'b' collided with 'A' because
> that was the first OID match?  Ideally with this scheme we'd have to
> search the entire index prior to 'b' and then report that 'b' collided
> with either 'A' or 'B'.  Neither message instills confidence.  And
> there's no way to prefer answer 'B' over 'A' without using knowledge
> of the FS name mangling/aliasing rules -- unless we want to just assume
> ignore-case for this iteration.

A possibly crazy idea: Don't bother reporting the other filename; just
report the OID instead.

"Error: Foo.txt cannot be checked out because another file with hash
<whatever> is in the way."  Maybe even add a hint for the user: "Run
`git ls-files -s` to see see all files and their hash".

Whatever the exact wording for the error message, just create a nice
post on stackoverflow.com explaining the various weird filesystems out
there (VFAT, NTFS, HFS, APFS, etc) and how they cause differing
filenames to be written to the same location.  Have a bunch of folks
vote it up so it has some nice search-engine juice.


The error message isn't quite as good, but does the user really need
all the names of the file?  If so, we gave them enough information to
figure it out, and this is a really unusual case anyway, right?
Besides, now we're back to linear performance....

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09 21:40                                 ` Elijah Newren
@ 2018-08-09 21:44                                   ` Jeff King
  2018-08-09 21:53                                     ` Elijah Newren
  2018-08-09 22:07                                   ` Junio C Hamano
  1 sibling, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-08-09 21:44 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jeff Hostetler, Junio C Hamano,
	Nguyễn Thái Ngọc, Git Mailing List,
	pawelparuzel95, brian m. carlson, Torsten Bögershausen

On Thu, Aug 09, 2018 at 02:40:58PM -0700, Elijah Newren wrote:

> > I worry that the false positives make this a non-starter.  I mean, if
> > clone creates files 'A' and 'B' (both equal) and then tries to create
> > 'b', would the collision code reports that 'b' collided with 'A' because
> > that was the first OID match?  Ideally with this scheme we'd have to
> > search the entire index prior to 'b' and then report that 'b' collided
> > with either 'A' or 'B'.  Neither message instills confidence.  And
> > there's no way to prefer answer 'B' over 'A' without using knowledge
> > of the FS name mangling/aliasing rules -- unless we want to just assume
> > ignore-case for this iteration.
> 
> A possibly crazy idea: Don't bother reporting the other filename; just
> report the OID instead.
> 
> "Error: Foo.txt cannot be checked out because another file with hash
> <whatever> is in the way."  Maybe even add a hint for the user: "Run
> `git ls-files -s` to see see all files and their hash".
> 
> Whatever the exact wording for the error message, just create a nice
> post on stackoverflow.com explaining the various weird filesystems out
> there (VFAT, NTFS, HFS, APFS, etc) and how they cause differing
> filenames to be written to the same location.  Have a bunch of folks
> vote it up so it has some nice search-engine juice.

Actually, I kind of like the simplicity of that. It puts the human brain
in the loop.

> The error message isn't quite as good, but does the user really need
> all the names of the file?  If so, we gave them enough information to
> figure it out, and this is a really unusual case anyway, right?
> Besides, now we're back to linear performance....

Well, it's still quadratic when they run O(n) iterations of "git
ls-files -s | grep $colliding_oid". You've just pushed the second linear
search onto the user. ;)

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09 21:44                                   ` Jeff King
@ 2018-08-09 21:53                                     ` Elijah Newren
  2018-08-09 21:59                                       ` Jeff King
  0 siblings, 1 reply; 96+ messages in thread
From: Elijah Newren @ 2018-08-09 21:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Jeff Hostetler, Junio C Hamano,
	Nguyễn Thái Ngọc, Git Mailing List,
	pawelparuzel95, brian m. carlson, Torsten Bögershausen

On Thu, Aug 9, 2018 at 2:44 PM Jeff King <peff@peff.net> wrote:
> > The error message isn't quite as good, but does the user really need
> > all the names of the file?  If so, we gave them enough information to
> > figure it out, and this is a really unusual case anyway, right?
> > Besides, now we're back to linear performance....
>
> Well, it's still quadratic when they run O(n) iterations of "git
> ls-files -s | grep $colliding_oid". You've just pushed the second linear
> search onto the user. ;)

Wouldn't that be their own fault for not running
  git ls-files -s | grep -e $colliding_oid_1 ... -e $colliding_oid_n | sort -k 2
?   ;-)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09 21:53                                     ` Elijah Newren
@ 2018-08-09 21:59                                       ` Jeff King
  2018-08-09 23:05                                         ` Elijah Newren
  0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2018-08-09 21:59 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jeff Hostetler, Junio C Hamano,
	Nguyễn Thái Ngọc, Git Mailing List,
	pawelparuzel95, brian m. carlson, Torsten Bögershausen

On Thu, Aug 09, 2018 at 02:53:42PM -0700, Elijah Newren wrote:

> On Thu, Aug 9, 2018 at 2:44 PM Jeff King <peff@peff.net> wrote:
> > > The error message isn't quite as good, but does the user really need
> > > all the names of the file?  If so, we gave them enough information to
> > > figure it out, and this is a really unusual case anyway, right?
> > > Besides, now we're back to linear performance....
> >
> > Well, it's still quadratic when they run O(n) iterations of "git
> > ls-files -s | grep $colliding_oid". You've just pushed the second linear
> > search onto the user. ;)
> 
> Wouldn't that be their own fault for not running
>   git ls-files -s | grep -e $colliding_oid_1 ... -e $colliding_oid_n | sort -k 2
> ?   ;-)

Man, this thread is the gift that keeps on giving. :)

That's still quadratic, isn't it? You've just hidden the second
dimension in the single grep call.

Now since these are all going to be constant strings, in theory an
intelligent grep could stick them all in a search trie, and match each
line with complexity k, the length of the matched strings. And since
k=40, that's technically still linear overall.

-Peff

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09 21:40                                 ` Elijah Newren
  2018-08-09 21:44                                   ` Jeff King
@ 2018-08-09 22:07                                   ` Junio C Hamano
  1 sibling, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2018-08-09 22:07 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jeff Hostetler, Jeff King, Nguyễn Thái Ngọc,
	Git Mailing List, pawelparuzel95, brian m. carlson,
	Torsten Bögershausen

Elijah Newren <newren@gmail.com> writes:

> A possibly crazy idea: Don't bother reporting the other filename; just
> report the OID instead.
>
> "Error: Foo.txt cannot be checked out because another file with hash
> <whatever> is in the way."  Maybe even add a hint for the user: "Run
> `git ls-files -s` to see see all files and their hash".

Once we start using OID to talk to humans, we are already lost.  At
that point, it would be 1000% better to

 - not check out Foo.txt to unlink and overwrite; instead leave the
   content that is already in the working tree as-is; and

 - report that Foo.txt was not checked out as something else that
   also claims to deserve that path was checked out already.

Then the user can inspect Foo.txt and realize it is actually the
contents that should be in foo.txt or whatever.



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
  2018-08-09 21:59                                       ` Jeff King
@ 2018-08-09 23:05                                         ` Elijah Newren
  0 siblings, 0 replies; 96+ messages in thread
From: Elijah Newren @ 2018-08-09 23:05 UTC (permalink / raw)
  To: Jeff King
  Cc: Jeff Hostetler, Junio C Hamano,
	Nguyễn Thái Ngọc, Git Mailing List,
	pawelparuzel95, brian m. carlson, Torsten Bögershausen

On Thu, Aug 9, 2018 at 2:59 PM Jeff King <peff@peff.net> wrote:
> On Thu, Aug 09, 2018 at 02:53:42PM -0700, Elijah Newren wrote:
>
> > On Thu, Aug 9, 2018 at 2:44 PM Jeff King <peff@peff.net> wrote:
> > > > The error message isn't quite as good, but does the user really need
> > > > all the names of the file?  If so, we gave them enough information to
> > > > figure it out, and this is a really unusual case anyway, right?
> > > > Besides, now we're back to linear performance....
> > >
> > > Well, it's still quadratic when they run O(n) iterations of "git
> > > ls-files -s | grep $colliding_oid". You've just pushed the second linear
> > > search onto the user. ;)
> >
> > Wouldn't that be their own fault for not running
> >   git ls-files -s | grep -e $colliding_oid_1 ... -e $colliding_oid_n | sort -k 2
> > ?   ;-)
>
> Man, this thread is the gift that keeps on giving. :)
>
> That's still quadratic, isn't it? You've just hidden the second
> dimension in the single grep call.

It may depend on the implementation within grep.  If I had remembered
to pass -F, and if wikipedia's claims[1] about the Aho-Corasick
algotihrm being the basis of the original Unix command fgrep, and it's
claims that this algorithm is "linear in the length of the strings
plus the length of the searched text plus the number of output
matches", and I didn't just misunderstand something there, then it
looks like it could be linear.

[1] https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm

Of course, the grep implementation could also do something stupid and
provide dramatically worse behavior than running the command N times
specifying one pattern each time.[2]

[2] http://savannah.gnu.org/bugs/?16305

> Now since these are all going to be constant strings, in theory an
> intelligent grep could stick them all in a search trie, and match each
> line with complexity k, the length of the matched strings. And since
> k=40, that's technically still linear overall.

Looks like Aho-Corasick uses "a finite-state machine that resembles a
trie", so definitely along those lines.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v3 0/1] clone: warn on colidding entries on checkout
  2018-08-07 19:01                   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  2018-08-07 19:31                     ` Junio C Hamano
@ 2018-08-10 15:36                     ` Nguyễn Thái Ngọc Duy
  2018-08-10 15:36                       ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
                                         ` (2 more replies)
  1 sibling, 3 replies; 96+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-10 15:36 UTC (permalink / raw)
  To: pclouds; +Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, tboegi

There are lots of suggestions on optimizing this stuff, but since this
problem does not affect me to begin with,  I'm reluctant to make more
changes and going to stay simple, stupid and slow. I could continue to
do small updates if needed. But for bigger changes, consider this
patch dropped by me.

v3 now uses inode on UNIXy platforms for checking colliding items. I
still don't try to separate colliding groups because it should be
quite obvious once you look at the colliding list (and most of the
time I suspect we only have one or two groups).

Since on Windows we can't really have colliding groups (no inode to
check) so the warning message is to me a bit misleading. But I frankly
don't want to put more effort in this.

Nguyễn Thái Ngọc Duy (1):
  clone: report duplicate entries on case-insensitive filesystems

 builtin/clone.c |  1 +
 cache.h         |  2 ++
 entry.c         | 32 ++++++++++++++++++++++++++++++++
 unpack-trees.c  | 22 ++++++++++++++++++++++
 unpack-trees.h  |  1 +
 5 files changed, 58 insertions(+)

-- 
2.18.0.915.gd571298aae


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems
  2018-08-10 15:36                     ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy
@ 2018-08-10 15:36                       ` Nguyễn Thái Ngọc Duy
  2018-08-10 16:42                         ` Junio C Hamano
  2018-08-11 10:09                         ` SZEDER Gábor
  2018-08-10 16:12                       ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Junio C Hamano
  2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2 siblings, 2 replies; 96+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-10 15:36 UTC (permalink / raw)
  To: pclouds; +Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, tboegi

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

This patch is tested with vim-colorschemes repository on a JFS partition
with case insensitive support on Linux. This repository has two files
darkBlue.vim and darkblue.vim.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c |  1 +
 cache.h         |  2 ++
 entry.c         | 32 ++++++++++++++++++++++++++++++++
 unpack-trees.c  | 22 ++++++++++++++++++++++
 unpack-trees.h  |  1 +
 5 files changed, 58 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9ebb5acf56..38d5609282 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -748,6 +748,7 @@ static int checkout(int submodule_progress)
 	memset(&opts, 0, sizeof opts);
 	opts.update = 1;
 	opts.merge = 1;
+	opts.clone = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/cache.h b/cache.h
index 8dc7134f00..cdf0984707 100644
--- a/cache.h
+++ b/cache.h
@@ -1515,9 +1515,11 @@ struct checkout {
 	const char *base_dir;
 	int base_dir_len;
 	struct delayed_checkout *delayed_checkout;
+	int *nr_duplicates;
 	unsigned force:1,
 		 quiet:1,
 		 not_new:1,
+		 clone:1,
 		 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..f2d73e6255 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,35 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen)
 	return lstat(path, st);
 }
 
+static void mark_duplicate_entries(const struct checkout *state,
+				   struct cache_entry *ce, struct stat *st)
+{
+	int i;
+	int *count = state->nr_duplicates;
+
+	if (!count)
+		BUG("state->nr_duplicates must not be NULL");
+
+	ce->ce_flags |= CE_MATCHED;
+	(*count)++;
+
+#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
+	for (i = 0; i < state->istate->cache_nr; i++) {
+		struct cache_entry *dup = state->istate->cache[i];
+
+		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+			continue;
+
+		if (ce_uptodate(dup) &&
+		    dup->ce_stat_data.sd_ino == st->st_ino) {
+			dup->ce_flags |= CE_MATCHED;
+			(*count)++;
+			break;
+		}
+	}
+#endif
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +484,9 @@ int checkout_entry(struct cache_entry *ce,
 			return -1;
 		}
 
+		if (state->clone)
+			mark_duplicate_entries(state, ce, &st);
+
 		/*
 		 * We unlink the old file, to get the new one with the
 		 * right permissions (including umask, which is nasty
diff --git a/unpack-trees.c b/unpack-trees.c
index f9efee0836..d4fece913c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -344,12 +344,20 @@ static int check_updates(struct unpack_trees_options *o)
 	struct index_state *index = &o->result;
 	struct checkout state = CHECKOUT_INIT;
 	int i;
+	int nr_duplicates = 0;
 
 	state.force = 1;
 	state.quiet = 1;
 	state.refresh_cache = 1;
 	state.istate = index;
 
+	if (o->clone) {
+		state.clone = 1;
+		state.nr_duplicates = &nr_duplicates;
+		for (i = 0; i < index->cache_nr; i++)
+			index->cache[i]->ce_flags &= ~CE_MATCHED;
+	}
+
 	progress = get_progress(o);
 
 	if (o->update)
@@ -414,6 +422,20 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+
+	if (o->clone && state.nr_duplicates) {
+		warning(_("the following paths have collided and only one from the same\n"
+			  "colliding group is in the working tree:\n"));
+		for (i = 0; i < index->cache_nr; i++) {
+			struct cache_entry *ce = index->cache[i];
+
+			if (!(ce->ce_flags & CE_MATCHED))
+				continue;
+			fprintf(stderr, "  '%s'\n", ce->name);
+			ce->ce_flags &= ~CE_MATCHED;
+		}
+	}
+
 	return errs != 0;
 }
 
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..d940f1c5c2 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -42,6 +42,7 @@ struct unpack_trees_options {
 	unsigned int reset,
 		     merge,
 		     update,
+		     clone,
 		     index_only,
 		     nontrivial_merge,
 		     trivial_merges_only,
-- 
2.18.0.915.gd571298aae


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 0/1] clone: warn on colidding entries on checkout
  2018-08-10 15:36                     ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy
  2018-08-10 15:36                       ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
@ 2018-08-10 16:12                       ` Junio C Hamano
  2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2018-08-10 16:12 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, git, newren, pawelparuzel95, peff, sandals, tboegi

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> There are lots of suggestions on optimizing this stuff, but since this
> problem does not affect me to begin with,  I'm reluctant to make more
> changes and going to stay simple, stupid and slow. I could continue to
> do small updates if needed. But for bigger changes, consider this
> patch dropped by me.
>
> v3 now uses inode on UNIXy platforms for checking colliding items. I
> still don't try to separate colliding groups because it should be
> quite obvious once you look at the colliding list (and most of the
> time I suspect we only have one or two groups).

I think that design decision is fine.  We can extend it later if
needed, but I would not be surprised if what you have here is
sufficient.

Another possible follow-up in the future may be to encapsulate the
"I have a cache-entry 'dup', and stat data 'st' taken for a path
in the working tree.  Does it look likely that the latter is the
result of checking out the former?" logic, which you currently has a
hard-coded if() statement condition, into a helper function and
make its implementation platform dependent.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems
  2018-08-10 15:36                       ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
@ 2018-08-10 16:42                         ` Junio C Hamano
  2018-08-11 10:09                         ` SZEDER Gábor
  1 sibling, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2018-08-10 16:42 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, git, newren, pawelparuzel95, peff, sandals, tboegi

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> +static void mark_duplicate_entries(const struct checkout *state,
> +				   struct cache_entry *ce, struct stat *st)
> +{
> +	int i;
> +	int *count = state->nr_duplicates;
> +
> +	if (!count)
> +		BUG("state->nr_duplicates must not be NULL");
> +
> +	ce->ce_flags |= CE_MATCHED;
> +	(*count)++;

We tried to check out ce and found out that the path is already
occupied, whose stat data is in st.  We increment (*count)++ to
signal the fact that ce itself had trouble.

> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> +	for (i = 0; i < state->istate->cache_nr; i++) {

Do we want to count up all the way?  IOW, don't we want to stop when
we see the ce itself?

> +		struct cache_entry *dup = state->istate->cache[i];
> +
> +		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> +			continue;
> +
> +		if (ce_uptodate(dup) &&
> +		    dup->ce_stat_data.sd_ino == st->st_ino) {
> +			dup->ce_flags |= CE_MATCHED;

Is checking ce_uptodate(dup) done to check that the 'dup' is a
freshly checked out entry?  I'd like to understand why it is
necessary, especially beause you know you only call this function in
the o->clone codepath, so everything ought to be fresh.

I agree that the check you have above is sufficient on inum capable
systems.  I also suspect that Windows folks, if they really wish to
report which other path(s) are colliding with ce, would want to
replace this whole loop with a platform specific helper function,
not just the condition in the above "if" statement [*1*], so what we
see here should be sufficient for now.

	Side note #1.  It is sufficient for inum capable systems to
	look at 'dup' and 'st' to cheaply identify collision, but it
	may be more efficient for other systems to look at ce and
	the previous entries of the index, which allows them to use
	platform specific API that knows case-folding and other
	pathname munging specifics.  Going that way lets them
	minimally emulate 'struct stat', which may be expensive to
	fill.

> +			(*count)++;

And then we increment (*count)++ for this _other_ one that
collided with ce.

> +			break;

And we leave the loop.  There may be somebody else that collided
when 'dup' was checked out, making this triple collision, but in
such a case, we won't be coming this far in this loop (i.e.  we
would have continued on this 'dup' as it is marked as CE_MATCHED),
so there is no reason to continue looking for further collision with
'ce' here.  So nr_duplicates kept track by these (*count)++ will
indicate number of paths involved in any collision correctly, I
think.

Makes sense.

> +		}
> +	}
> +#endif
> +}
> +
>  /*
>   * Write the contents from ce out to the working tree.
>   *
> @@ -455,6 +484,9 @@ int checkout_entry(struct cache_entry *ce,
>  			return -1;
>  		}
>  
> +		if (state->clone)
> +			mark_duplicate_entries(state, ce, &st);
> +
>  		/*
>  		 * We unlink the old file, to get the new one with the
>  		 * right permissions (including umask, which is nasty
> diff --git a/unpack-trees.c b/unpack-trees.c
> index f9efee0836..d4fece913c 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -344,12 +344,20 @@ static int check_updates(struct unpack_trees_options *o)
>  	struct index_state *index = &o->result;
>  	struct checkout state = CHECKOUT_INIT;
>  	int i;
> +	int nr_duplicates = 0;
>  
>  	state.force = 1;
>  	state.quiet = 1;
>  	state.refresh_cache = 1;
>  	state.istate = index;
>  
> +	if (o->clone) {
> +		state.clone = 1;
> +		state.nr_duplicates = &nr_duplicates;
> +		for (i = 0; i < index->cache_nr; i++)
> +			index->cache[i]->ce_flags &= ~CE_MATCHED;
> +	}
> +
>  	progress = get_progress(o);
>  
>  	if (o->update)
> @@ -414,6 +422,20 @@ static int check_updates(struct unpack_trees_options *o)
>  	errs |= finish_delayed_checkout(&state);
>  	if (o->update)
>  		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
> +
> +	if (o->clone && state.nr_duplicates) {

Does state.nr_duplicates have to be a pointer to int?  I am
wondering if state.clone bit is sufficient to protect codepaths that
needs to spend cycles to keep track of that number, in which case we
can do without local variable nr_duplicates here, and the set-up
code in the previous hunk not to store the pointer to it in
state.nr_duplicates (instead, that field itself can become an int).

> +		warning(_("the following paths have collided and only one from the same\n"
> +			  "colliding group is in the working tree:\n"));

As we are not grouping (and do not particularly see the need to
group), perhaps we can do without mentioning group in the warning?
e.g.

	some of the following may have been checked out incorrectly
	due to limitation of the filesystem like case insensitivity.


> +		for (i = 0; i < index->cache_nr; i++) {
> +			struct cache_entry *ce = index->cache[i];
> +
> +			if (!(ce->ce_flags & CE_MATCHED))
> +				continue;
> +			fprintf(stderr, "  '%s'\n", ce->name);
> +			ce->ce_flags &= ~CE_MATCHED;
> +		}
> +	}

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems
  2018-08-10 15:36                       ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2018-08-10 16:42                         ` Junio C Hamano
@ 2018-08-11 10:09                         ` SZEDER Gábor
  2018-08-11 13:16                           ` Duy Nguyen
  1 sibling, 1 reply; 96+ messages in thread
From: SZEDER Gábor @ 2018-08-11 10:09 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: SZEDER Gábor, git, git, gitster, newren, pawelparuzel95,
	peff, sandals, tboegi


> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> This patch is tested with vim-colorschemes repository on a JFS partition
> with case insensitive support on Linux. This repository has two files
> darkBlue.vim and darkblue.vim.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

This patch makes 'clone http repository' in
't5551-http-fetch-smart.sh' fail with:

  --- exp 2018-08-11 02:29:45.216641851 +0000
  +++ actual.smudged      2018-08-11 02:29:45.264642318 +0000
  @@ -15,3 +15,5 @@
   < Pragma: no-cache
   < Cache-Control: no-cache, max-age=0, must-revalidate
   < Content-Type: application/x-git-upload-pack-result
  +> warning: the following paths have collided and only one from the same
  +> colliding group is in the working tree:


This highlights a few issues:

  - This test runs

      GIT_TRACE_CURL=true git clone --quiet <URL> 2>err

    i.e. the curl trace and any errors or warnings from 'git clone'
    end up in the same file.  This test then removes a lot of
    uninteresting headers before the comparison with 'test_cmp', but
    the warning remains and then triggers the test failure.

    Several other tests run a command like this, but those don't use
    'test_cmp', but only grep the output to verify the presence or
    absence of certain headers.

    I'm inclined to think that it would be prudent to change all these
    tests to send the curl trace to a dedicated file (and then
    '--quiet' can be removed as well).  Though, arguably, had that
    been already the case, this test wouldn't have failed, and we
    probably wouldn't have noticed that something is wrong.

  - But what triggered this warning in the first place?  'git clone'
    didn't print anything after the that warning, even when I re-run
    the test with that '--quiet' option removed.  Furthermore, the
    cloned repository contains a single file, so there could be no
    case/folding collision among multiple files.

    I also notice that this patch doesn't add any tests... :)

  - I didn't understand this warning, I had to read the corresponding
    commit message to figure out what it's all about.



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems
  2018-08-11 10:09                         ` SZEDER Gábor
@ 2018-08-11 13:16                           ` Duy Nguyen
  2018-08-13 16:55                             ` Junio C Hamano
  0 siblings, 1 reply; 96+ messages in thread
From: Duy Nguyen @ 2018-08-11 13:16 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff Hostetler, Git Mailing List, Junio C Hamano, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	Torsten Bögershausen

On Sat, Aug 11, 2018 at 12:09 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
>
> > Paths that only differ in case work fine in a case-sensitive
> > filesystems, but if those repos are cloned in a case-insensitive one,
> > you'll get problems. The first thing to notice is "git status" will
> > never be clean with no indication what exactly is "dirty".
> >
> > This patch helps the situation a bit by pointing out the problem at
> > clone time. Even though this patch talks about case sensitivity, the
> > patch makes no assumption about folding rules by the filesystem. It
> > simply observes that if an entry has been already checked out at clone
> > time when we're about to write a new path, some folding rules are
> > behind this.
> >
> > This patch is tested with vim-colorschemes repository on a JFS partition
> > with case insensitive support on Linux. This repository has two files
> > darkBlue.vim and darkblue.vim.
> >
> > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> This patch makes 'clone http repository' in
> 't5551-http-fetch-smart.sh' fail with:
>
>   --- exp 2018-08-11 02:29:45.216641851 +0000
>   +++ actual.smudged      2018-08-11 02:29:45.264642318 +0000
>   @@ -15,3 +15,5 @@
>    < Pragma: no-cache
>    < Cache-Control: no-cache, max-age=0, must-revalidate
>    < Content-Type: application/x-git-upload-pack-result
>   +> warning: the following paths have collided and only one from the same
>   +> colliding group is in the working tree:

I was careless and checked the wrong variable (should have checked
nr_duplicates not state.nr_duplicates; the second is a pointer). So we
always get this warning (and with no following list of files)

>     I also notice that this patch doesn't add any tests... :)

This is platform specific and I was to be frank a bit lazy. Will
consider adding a test with CASE_INSENSITIVE_FS after this.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-10 15:36                     ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy
  2018-08-10 15:36                       ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2018-08-10 16:12                       ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Junio C Hamano
@ 2018-08-12  9:07                       ` Nguyễn Thái Ngọc Duy
  2018-08-13 15:32                         ` Jeff Hostetler
                                           ` (3 more replies)
  2 siblings, 4 replies; 96+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-12  9:07 UTC (permalink / raw)
  To: pclouds
  Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, tboegi,
	SZEDER Gábor

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

This patch is tested with vim-colorschemes repository on a JFS partition
with case insensitive support on Linux. This repository has two files
darkBlue.vim and darkblue.vim.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 v4 removes nr_duplicates (and fixes that false warning Szeder
 reported). It also hints about case insensitivity as a cause of
 problem because it's most likely the case when this warning shows up.

 builtin/clone.c  |  1 +
 cache.h          |  1 +
 entry.c          | 28 ++++++++++++++++++++++++++++
 t/t5601-clone.sh |  8 +++++++-
 unpack-trees.c   | 28 ++++++++++++++++++++++++++++
 unpack-trees.h   |  1 +
 6 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
 	memset(&opts, 0, sizeof opts);
 	opts.update = 1;
 	opts.merge = 1;
+	opts.clone = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
 	unsigned force:1,
 		 quiet:1,
 		 not_new:1,
+		 clone:1,
 		 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..c70340df8e 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen)
 	return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+				   struct cache_entry *ce, struct stat *st)
+{
+	int i;
+
+	ce->ce_flags |= CE_MATCHED;
+
+#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
+	for (i = 0; i < state->istate->cache_nr; i++) {
+		struct cache_entry *dup = state->istate->cache[i];
+
+		if (dup == ce)
+			break;
+
+		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+			continue;
+
+		if (dup->ce_stat_data.sd_ino == st->st_ino) {
+			dup->ce_flags |= CE_MATCHED;
+			break;
+		}
+	}
+#endif
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce,
 			return -1;
 		}
 
+		if (state->clone)
+			mark_colliding_entries(state, ce, &st);
+
 		/*
 		 * We unlink the old file, to get the new one with the
 		 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
 			git hash-object -w -t tree --stdin) &&
 		c=$(git commit-tree -m bogus $t) &&
 		git update-ref refs/heads/bogus $c &&
-		git clone -b bogus . bogus
+		git clone -b bogus . bogus 2>warning
 	)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
+	grep X icasefs/warning &&
+	grep x icasefs/warning &&
+	test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
 partial_clone () {
 	       SERVER="$1" &&
 	       URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..443df048ef 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o)
 	state.refresh_cache = 1;
 	state.istate = index;
 
+	if (o->clone) {
+		state.clone = 1;
+		for (i = 0; i < index->cache_nr; i++)
+			index->cache[i]->ce_flags &= ~CE_MATCHED;
+	}
+
 	progress = get_progress(o);
 
 	if (o->update)
@@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+
+	if (o->clone) {
+		int printed_warning = 0;
+
+		for (i = 0; i < index->cache_nr; i++) {
+			struct cache_entry *ce = index->cache[i];
+
+			if (!(ce->ce_flags & CE_MATCHED))
+				continue;
+
+			if (!printed_warning) {
+				warning(_("the following paths have collided (e.g. case-sensitive paths\n"
+					  "on a case-insensitive filesystem) and only one from the same\n"
+					  "colliding group is in the working tree:\n"));
+				printed_warning = 1;
+			}
+
+			fprintf(stderr, "  '%s'\n", ce->name);
+			ce->ce_flags &= ~CE_MATCHED;
+		}
+	}
+
 	return errs != 0;
 }
 
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..d940f1c5c2 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -42,6 +42,7 @@ struct unpack_trees_options {
 	unsigned int reset,
 		     merge,
 		     update,
+		     clone,
 		     index_only,
 		     nontrivial_merge,
 		     trivial_merges_only,
-- 
2.18.0.1004.g6639190530


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
@ 2018-08-13 15:32                         ` Jeff Hostetler
  2018-08-13 17:18                         ` Junio C Hamano
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 96+ messages in thread
From: Jeff Hostetler @ 2018-08-13 15:32 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, gitster, newren, pawelparuzel95, peff, sandals, tboegi,
	SZEDER Gábor



On 8/12/2018 5:07 AM, Nguyễn Thái Ngọc Duy wrote:
> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> This patch is tested with vim-colorschemes repository on a JFS partition
> with case insensitive support on Linux. This repository has two files
> darkBlue.vim and darkblue.vim.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   v4 removes nr_duplicates (and fixes that false warning Szeder
>   reported). It also hints about case insensitivity as a cause of
>   problem because it's most likely the case when this warning shows up.
> 
>   builtin/clone.c  |  1 +
>   cache.h          |  1 +
>   entry.c          | 28 ++++++++++++++++++++++++++++
>   t/t5601-clone.sh |  8 +++++++-
>   unpack-trees.c   | 28 ++++++++++++++++++++++++++++
>   unpack-trees.h   |  1 +
>   6 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5c439f1394..0702b0e9d0 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
>   	memset(&opts, 0, sizeof opts);
>   	opts.update = 1;
>   	opts.merge = 1;
> +	opts.clone = 1;
>   	opts.fn = oneway_merge;
>   	opts.verbose_update = (option_verbosity >= 0);
>   	opts.src_index = &the_index;
> diff --git a/cache.h b/cache.h
> index 8b447652a7..6d6138f4f1 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1455,6 +1455,7 @@ struct checkout {
>   	unsigned force:1,
>   		 quiet:1,
>   		 not_new:1,
> +		 clone:1,
>   		 refresh_cache:1;
>   };
>   #define CHECKOUT_INIT { NULL, "" }
> diff --git a/entry.c b/entry.c
> index b5d1d3cf23..c70340df8e 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen)
>   	return lstat(path, st);
>   }
>   
> +static void mark_colliding_entries(const struct checkout *state,
> +				   struct cache_entry *ce, struct stat *st)
> +{
> +	int i;
> +
> +	ce->ce_flags |= CE_MATCHED;
> +
> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> +	for (i = 0; i < state->istate->cache_nr; i++) {
> +		struct cache_entry *dup = state->istate->cache[i];
> +
> +		if (dup == ce)
> +			break;
> +
> +		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> +			continue;
> +
> +		if (dup->ce_stat_data.sd_ino == st->st_ino) {
> +			dup->ce_flags |= CE_MATCHED;
> +			break;
> +		}
> +	}
> +#endif
> +}
> +
>   /*
>    * Write the contents from ce out to the working tree.
>    *
> @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce,
>   			return -1;
>   		}
>   
> +		if (state->clone)
> +			mark_colliding_entries(state, ce, &st);
> +
>   		/*
>   		 * We unlink the old file, to get the new one with the
>   		 * right permissions (including umask, which is nasty
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 0b62037744..f2eb73bc74 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
>   			git hash-object -w -t tree --stdin) &&
>   		c=$(git commit-tree -m bogus $t) &&
>   		git update-ref refs/heads/bogus $c &&
> -		git clone -b bogus . bogus
> +		git clone -b bogus . bogus 2>warning
>   	)
>   '
>   
> +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
> +	grep X icasefs/warning &&
> +	grep x icasefs/warning &&
> +	test_i18ngrep "the following paths have collided" icasefs/warning
> +'
> +
>   partial_clone () {
>   	       SERVER="$1" &&
>   	       URL="$2" &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index cd0680f11e..443df048ef 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o)
>   	state.refresh_cache = 1;
>   	state.istate = index;
>   
> +	if (o->clone) {
> +		state.clone = 1;
> +		for (i = 0; i < index->cache_nr; i++)
> +			index->cache[i]->ce_flags &= ~CE_MATCHED;
> +	}
> +
>   	progress = get_progress(o);
>   
>   	if (o->update)
> @@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o)
>   	errs |= finish_delayed_checkout(&state);
>   	if (o->update)
>   		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
> +
> +	if (o->clone) {
> +		int printed_warning = 0;
> +
> +		for (i = 0; i < index->cache_nr; i++) {
> +			struct cache_entry *ce = index->cache[i];
> +
> +			if (!(ce->ce_flags & CE_MATCHED))
> +				continue;
> +
> +			if (!printed_warning) {
> +				warning(_("the following paths have collided (e.g. case-sensitive paths\n"
> +					  "on a case-insensitive filesystem) and only one from the same\n"
> +					  "colliding group is in the working tree:\n"));
> +				printed_warning = 1;
> +			}
> +
> +			fprintf(stderr, "  '%s'\n", ce->name);
> +			ce->ce_flags &= ~CE_MATCHED;
> +		}
> +	}
> +

If I'm reading this correctly, on Linux and friends, you'll print the
names of the files where the collision was detected and the paths of
any peers found from the inum matching.  And because of the #ifdef'ing
on Windows, we'll just get the former (at least for now).

That sounds fine.
Thanks
Jeff


>   	return errs != 0;
>   }
>   
> diff --git a/unpack-trees.h b/unpack-trees.h
> index c2b434c606..d940f1c5c2 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -42,6 +42,7 @@ struct unpack_trees_options {
>   	unsigned int reset,
>   		     merge,
>   		     update,
> +		     clone,
>   		     index_only,
>   		     nontrivial_merge,
>   		     trivial_merges_only,
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems
  2018-08-11 13:16                           ` Duy Nguyen
@ 2018-08-13 16:55                             ` Junio C Hamano
  2018-08-13 17:12                               ` Duy Nguyen
  0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-13 16:55 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: SZEDER Gábor, Jeff Hostetler, Git Mailing List,
	Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson,
	Torsten Bögershausen

Duy Nguyen <pclouds@gmail.com> writes:

> I was careless and checked the wrong variable (should have checked
> nr_duplicates not state.nr_duplicates; the second is a pointer). So we
> always get this warning (and with no following list of files)

Heh, does that bug go away if you got rid of the pointer-ness of the
field and store the value directly in there?

>>     I also notice that this patch doesn't add any tests... :)
>
> This is platform specific and I was to be frank a bit lazy. Will
> consider adding a test with CASE_INSENSITIVE_FS after this.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems
  2018-08-13 16:55                             ` Junio C Hamano
@ 2018-08-13 17:12                               ` Duy Nguyen
  0 siblings, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-13 17:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, Jeff Hostetler, Git Mailing List,
	Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson,
	Torsten Bögershausen

On Mon, Aug 13, 2018 at 6:55 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Duy Nguyen <pclouds@gmail.com> writes:
>
> > I was careless and checked the wrong variable (should have checked
> > nr_duplicates not state.nr_duplicates; the second is a pointer). So we
> > always get this warning (and with no following list of files)
>
> Heh, does that bug go away if you got rid of the pointer-ness of the
> field and store the value directly in there?

You mean replacing the pointer with a real counter in struct checkout?
That would not work (it was my first option) because struct checkout
is passed around as a const struct. entry.c code is not allowed to
make any updates there. So I got rid of both "nr_duplicates" and just
count again at the bottom of check_updates(). It's not that expensive
and it simplifies the code.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2018-08-13 15:32                         ` Jeff Hostetler
@ 2018-08-13 17:18                         ` Junio C Hamano
  2018-08-15 19:08                         ` Torsten Bögershausen
  2018-08-17 16:16                         ` [PATCH v5] " Nguyễn Thái Ngọc Duy
  3 siblings, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2018-08-13 17:18 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, git, newren, pawelparuzel95, peff, sandals, tboegi,
	SZEDER Gábor

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
>
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
>
> This patch is tested with vim-colorschemes repository on a JFS partition
> with case insensitive support on Linux. This repository has two files
> darkBlue.vim and darkblue.vim.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  v4 removes nr_duplicates (and fixes that false warning Szeder
>  reported). It also hints about case insensitivity as a cause of
>  problem because it's most likely the case when this warning shows up.

Ah, you no longer have that counter and the pointer to the counter,
as you do not even report how many paths collide ;-)  Makes sense.

> diff --git a/entry.c b/entry.c
> index b5d1d3cf23..c70340df8e 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen)
>  	return lstat(path, st);
>  }
>  
> +static void mark_colliding_entries(const struct checkout *state,
> +				   struct cache_entry *ce, struct stat *st)
> +{
> +	int i;
> +
> +	ce->ce_flags |= CE_MATCHED;
> +
> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> +	for (i = 0; i < state->istate->cache_nr; i++) {
> +		struct cache_entry *dup = state->istate->cache[i];
> +
> +		if (dup == ce)
> +			break;
> +
> +		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> +			continue;
> +
> +		if (dup->ce_stat_data.sd_ino == st->st_ino) {
> +			dup->ce_flags |= CE_MATCHED;
> +			break;
> +		}
> +	}
> +#endif
> +}

OK.  The whole loop might want to become a call to a helper function
whose implementation is platform dependent in the future, but that
should be kept outside the topic and left for a future enhancement.

> @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce,
>  			return -1;
>  		}
>  
> +		if (state->clone)
> +			mark_colliding_entries(state, ce, &st);

OK.  I haven't carefully looked at the codepath but is it more
involved to instead *not* check out this ce (and leave the working
tree file that is already there for another path in the index
alone)?  I suspect it won't be as simple as

		if (state->clone) {
			mark_colliding_entries(state, ce, &st);
			return -1;
		}

but I think it would give much more pleasant end-user experience if
we can do so, especially on GIT_WINDOWS_NATIVE.  I would imagine
that the first thing those who see the message "foo.txt have
collided with something else we are not telling you" would want to
do is to see what "foo.txt" contains---and it may be obvious to
human that it contains the contents intended for "Foo.txt" instead,
if we somehow refrained from overwriting it here, which would
compensate for the lack of "this is the other path that collided
with your file."

It is perfectly an acceptable answer if it is "I looked at it, and
it is a lot more involved as there are these fallouts from the
codepaths that the control flows later from this point you haven't
checked and considered---let's keep overwriting, it is much safer."

> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 0b62037744..f2eb73bc74 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
>  			git hash-object -w -t tree --stdin) &&
>  		c=$(git commit-tree -m bogus $t) &&
>  		git update-ref refs/heads/bogus $c &&
> -		git clone -b bogus . bogus
> +		git clone -b bogus . bogus 2>warning
>  	)
>  '
>  
> +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
> +	grep X icasefs/warning &&
> +	grep x icasefs/warning &&
> +	test_i18ngrep "the following paths have collided" icasefs/warning
> +'
> +

Ah, I was wondering why possible error message needs to be hidden in
the previous test---it is not hiding; it is capturing to look for
the paths in the message.  Makes sense.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
  2018-08-13 15:32                         ` Jeff Hostetler
  2018-08-13 17:18                         ` Junio C Hamano
@ 2018-08-15 19:08                         ` Torsten Bögershausen
  2018-08-15 19:35                           ` Duy Nguyen
  2018-08-15 19:38                           ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano
  2018-08-17 16:16                         ` [PATCH v5] " Nguyễn Thái Ngọc Duy
  3 siblings, 2 replies; 96+ messages in thread
From: Torsten Bögershausen @ 2018-08-15 19:08 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals,
	SZEDER Gábor

On Sun, Aug 12, 2018 at 11:07:14AM +0200, Nguyễn Thái Ngọc Duy wrote:
> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> This patch is tested with vim-colorschemes repository on a JFS partition
> with case insensitive support on Linux. This repository has two files
> darkBlue.vim and darkblue.vim.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  v4 removes nr_duplicates (and fixes that false warning Szeder
>  reported). It also hints about case insensitivity as a cause of
>  problem because it's most likely the case when this warning shows up.
> 
>  builtin/clone.c  |  1 +
>  cache.h          |  1 +
>  entry.c          | 28 ++++++++++++++++++++++++++++
>  t/t5601-clone.sh |  8 +++++++-
>  unpack-trees.c   | 28 ++++++++++++++++++++++++++++
>  unpack-trees.h   |  1 +
>  6 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5c439f1394..0702b0e9d0 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
>  	memset(&opts, 0, sizeof opts);
>  	opts.update = 1;
>  	opts.merge = 1;
> +	opts.clone = 1;
>  	opts.fn = oneway_merge;
>  	opts.verbose_update = (option_verbosity >= 0);
>  	opts.src_index = &the_index;
> diff --git a/cache.h b/cache.h
> index 8b447652a7..6d6138f4f1 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1455,6 +1455,7 @@ struct checkout {
>  	unsigned force:1,
>  		 quiet:1,
>  		 not_new:1,
> +		 clone:1,
>  		 refresh_cache:1;
>  };
>  #define CHECKOUT_INIT { NULL, "" }
> diff --git a/entry.c b/entry.c
> index b5d1d3cf23..c70340df8e 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen)
>  	return lstat(path, st);
>  }
>  
> +static void mark_colliding_entries(const struct checkout *state,
> +				   struct cache_entry *ce, struct stat *st)
> +{
> +	int i;
> +
> +	ce->ce_flags |= CE_MATCHED;
> +
> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> +	for (i = 0; i < state->istate->cache_nr; i++) {
> +		struct cache_entry *dup = state->istate->cache[i];
> +
> +		if (dup == ce)
> +			break;
> +
> +		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> +			continue;
> +

Should the following be protected by core.checkstat ? 
	if (check_stat) {


> +		if (dup->ce_stat_data.sd_ino == st->st_ino) {
> +			dup->ce_flags |= CE_MATCHED;
> +			break;
> +		}
> +	}
> +#endif

Another thing is that we switch of the ASCII case-folding-detection-logic
off for Windows users, even if we otherwise rely on icase.
I think we can use fspathcmp() as a fallback. when inodes fail,
because we may be on a network file system.
(I don't have a test setup at the moment, but what happens with inodes
when a Windows machine exports a share to Linux or Mac ?)

Is there a chance to get the fspathcmp() back, like this ?

static void mark_colliding_entries(const struct checkout *state,
				   struct cache_entry *ce, struct stat *st)
{
	int i;
	ce->ce_flags |= CE_MATCHED;

	for (i = 0; i < state->istate->cache_nr; i++) {
		struct cache_entry *dup = state->istate->cache[i];
		int folded = 0;

		if (dup == ce)
			break;

		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
			continue;

		if (!fspathcmp(dup->name, ce->name))
			folded = 1;

#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
		if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino))
			folded = 1;
#endif
		if (folded) {
			dup->ce_flags |= CE_MATCHED;
			break;
		}
	}
}


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-15 19:08                         ` Torsten Bögershausen
@ 2018-08-15 19:35                           ` Duy Nguyen
  2018-08-16 15:56                             ` [PATCH] config.txt: clarify core.checkStat = minimal Nguyễn Thái Ngọc Duy
  2018-08-15 19:38                           ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano
  1 sibling, 1 reply; 96+ messages in thread
From: Duy Nguyen @ 2018-08-15 19:35 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Jeff Hostetler, Git Mailing List, Junio C Hamano, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	SZEDER Gábor

On Wed, Aug 15, 2018 at 9:08 PM Torsten Bögershausen <tboegi@web.de> wrote:
> > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> > +     for (i = 0; i < state->istate->cache_nr; i++) {
> > +             struct cache_entry *dup = state->istate->cache[i];
> > +
> > +             if (dup == ce)
> > +                     break;
> > +
> > +             if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> > +                     continue;
> > +
>
> Should the following be protected by core.checkstat ?
>         if (check_stat) {

Good catch! st_ino is ignored if core.checkStat is false. I will
probably send a separate patch to add more details to config.txt about
this key.

> > +             if (dup->ce_stat_data.sd_ino == st->st_ino) {
> > +                     dup->ce_flags |= CE_MATCHED;
> > +                     break;
> > +             }
> > +     }
> > +#endif
>
> Another thing is that we switch of the ASCII case-folding-detection-logic
> off for Windows users, even if we otherwise rely on icase.
> I think we can use fspathcmp() as a fallback. when inodes fail,
> because we may be on a network file system.

I admit I did not think about network file system. Will spend some
time (and hopefully not on nfs kernel code) on it.

For falling back on fspathcmp even on Windows, is it really safe? I'm
on Linux and never have to deal with this issue to have any
experience. It does sound good though because it should be a subset
for any "weird" filesystems out there.

> (I don't have a test setup at the moment, but what happens with inodes
> when a Windows machine exports a share to Linux or Mac ?)
>
> Is there a chance to get the fspathcmp() back, like this ?
>
> static void mark_colliding_entries(const struct checkout *state,
>                                    struct cache_entry *ce, struct stat *st)
> {
>         int i;
>         ce->ce_flags |= CE_MATCHED;
>
>         for (i = 0; i < state->istate->cache_nr; i++) {
>                 struct cache_entry *dup = state->istate->cache[i];
>                 int folded = 0;
>
>                 if (dup == ce)
>                         break;
>
>                 if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>                         continue;
>
>                 if (!fspathcmp(dup->name, ce->name))
>                         folded = 1;
>
> #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
>                 if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino))
>                         folded = 1;
> #endif
>                 if (folded) {
>                         dup->ce_flags |= CE_MATCHED;
>                         break;
>                 }
>         }
> }
>


-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-15 19:08                         ` Torsten Bögershausen
  2018-08-15 19:35                           ` Duy Nguyen
@ 2018-08-15 19:38                           ` Junio C Hamano
  2018-08-16 14:03                             ` Torsten Bögershausen
  1 sibling, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-15 19:38 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Nguyễn Thái Ngọc Duy, git, git, newren,
	pawelparuzel95, peff, sandals, SZEDER Gábor

Torsten Bögershausen <tboegi@web.de> writes:

>> +
>> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
>> +	for (i = 0; i < state->istate->cache_nr; i++) {
>> +		struct cache_entry *dup = state->istate->cache[i];
>> +
>> +		if (dup == ce)
>> +			break;
>> +
>> +		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>> +			continue;
>> +
>
> Should the following be protected by core.checkstat ? 
> 	if (check_stat) {

I do not think such a if statement is strictly necessary.

Even if check_stat tells us "when checking if a cached stat
information tells us that the path may have modified, use minimum
set of fields from the 'struct stat'", we still capture and update
the values from the same "full" set of fields when we mark a cache
entry up-to-date.  So it all depends on why you are limiting with
check_stat.  Is it because stdev is unusable?  Is it because nsec is
unusable?  Is it because ino is unusable?  Only in the last case,
paying attention to check_stat will reduce the false positive.

But then you made me wonder what value check_stat has on Windows.
If it is false, perhaps we do not even need the conditional
compilation, which is a huge plus.

>> +		if (dup->ce_stat_data.sd_ino == st->st_ino) {
>> +			dup->ce_flags |= CE_MATCHED;
>> +			break;
>> +		}
>> +	}
>> +#endif
>
> Another thing is that we switch of the ASCII case-folding-detection-logic
> off for Windows users, even if we otherwise rely on icase.
> I think we can use fspathcmp() as a fallback. when inodes fail,
> because we may be on a network file system.
>
> (I don't have a test setup at the moment, but what happens with inodes
> when a Windows machine exports a share to Linux or Mac ?)
>
> Is there a chance to get the fspathcmp() back, like this ?

If fspathcmp() never gives false positives, I do not think we would
mind using it like your update.  False negatives are fine, as that
is better than just punting the whole thing when there is no usable
inum.  And we do not care all that much if it is more expensive;
this is an error codepath after all.

And from code structure's point of view, I think it makes sense.  It
would be even better if we can lose the conditional compilation.

Another thing we maybe want to see is if we can update the caller of
this function so that we do not overwrite the earlier checkout with
the data for this path.  When two paths collide, we check out one of
the paths without reporting (because we cannot notice), then attempt
to check out the other path and report (because we do notice the
previous one with lstat()).  The current code then goes on and overwrites
the file with the contents from the "other" path.

Even if we had false negative in this loop, if we leave the contents
for the earlier path while reporting the "other" path, then the user
can get curious, inspect what contents the "other" path has on the
filesystem, and can notice that it belongs to the (unreported--due
to false negative) earlier path.

> static void mark_colliding_entries(const struct checkout *state,
> 				   struct cache_entry *ce, struct stat *st)
> {
> 	int i;
> 	ce->ce_flags |= CE_MATCHED;
>
> 	for (i = 0; i < state->istate->cache_nr; i++) {
> 		struct cache_entry *dup = state->istate->cache[i];
> 		int folded = 0;
>
> 		if (dup == ce)
> 			break;
>
> 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> 			continue;
>
> 		if (!fspathcmp(dup->name, ce->name))
> 			folded = 1;
>
> #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> 		if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino))
> 			folded = 1;
> #endif
> 		if (folded) {
> 			dup->ce_flags |= CE_MATCHED;
> 			break;
> 		}
> 	}
> }

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-15 19:38                           ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano
@ 2018-08-16 14:03                             ` Torsten Bögershausen
  2018-08-16 15:42                               ` Duy Nguyen
  2018-08-16 16:23                               ` Junio C Hamano
  0 siblings, 2 replies; 96+ messages in thread
From: Torsten Bögershausen @ 2018-08-16 14:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, git, git, newren,
	pawelparuzel95, peff, sandals, SZEDER Gábor

On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote:

This should answer Duys comments as well.
> Torsten Bögershausen <tboegi@web.de> writes:
> 
[snip]
> > Should the following be protected by core.checkstat ? 
> > 	if (check_stat) {
> 
> I do not think such a if statement is strictly necessary.
> 
> Even if check_stat tells us "when checking if a cached stat
> information tells us that the path may have modified, use minimum
> set of fields from the 'struct stat'", we still capture and update
> the values from the same "full" set of fields when we mark a cache
> entry up-to-date.  So it all depends on why you are limiting with
> check_stat.  Is it because stdev is unusable?  Is it because nsec is
> unusable?  Is it because ino is unusable?  Only in the last case,
> paying attention to check_stat will reduce the false positive.
> 
> But then you made me wonder what value check_stat has on Windows.
> If it is false, perhaps we do not even need the conditional
> compilation, which is a huge plus.

Agreed:
check_stat is 0 on Windows, and inum is allways 0 in lstat().
I was thinking about systems which don't have inodes and inum,
and then generate an inum in memory, sometimes random.
After a reboot or a re-mount of the file systems those ino values
change.
However, for the initial clone we are fine in any case.

> 
> >> +		if (dup->ce_stat_data.sd_ino == st->st_ino) {
> >> +			dup->ce_flags |= CE_MATCHED;
> >> +			break;
> >> +		}
> >> +	}
> >> +#endif
> >
> > Another thing is that we switch of the ASCII case-folding-detection-logic
> > off for Windows users, even if we otherwise rely on icase.
> > I think we can use fspathcmp() as a fallback. when inodes fail,
> > because we may be on a network file system.
> >
> > (I don't have a test setup at the moment, but what happens with inodes
> > when a Windows machine exports a share to Linux or Mac ?)
> >
> > Is there a chance to get the fspathcmp() back, like this ?
> 
> If fspathcmp() never gives false positives, I do not think we would
> mind using it like your update.  False negatives are fine, as that
> is better than just punting the whole thing when there is no usable
> inum.  And we do not care all that much if it is more expensive;
> this is an error codepath after all.
> 
> And from code structure's point of view, I think it makes sense.  It
> would be even better if we can lose the conditional compilation.

The current implementation of fspathcmp() does not give false positvies,
and future versions should not either.
All case-insentive file systems have always treated 'a-z' equal to 'A-Z'.
In FAT MS/DOS there had only been uppercase letters as file names,
and `type file.txt` (the equivilant to ´cat file.txt´ in *nix)
simply resultet in `type FILE.TXT`
Later, with VFAT and later with HPFS/NTFS a file could be stored on
disk as "File.txt".
From now on  ´type FILE.TXT´ still worked, (and all other upper-lowercase
combinations).
This all is probably nothing new.
The main point should be that fspathcmp() should never return a false positive,
and I think we all agree on that. 


Now back to the compiler switch:
Windows always set inum to 0 and I can't think about a situation where
a file in a working tree gets inum = 0, can we use the following:

static void mark_colliding_entries(const struct checkout *state,
				   struct cache_entry *ce, struct stat *st)
{
	int i;
	ce->ce_flags |= CE_MATCHED;

	for (i = 0; i < state->istate->cache_nr; i++) {
		struct cache_entry *dup = state->istate->cache[i];
		int folded = 0;

		if (dup == ce)
			break;

		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
			continue;
		/*
		 * Windows sets ino to 0. On other FS ino = 0 will already be
		 *  used, so we don't see it for a file in a Git working tree
		 */
		if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
			folded = 1;

		/*
		 * Fallback for NTFS and other case insenstive FS,
		 * which don't use POSIX inums
		 */
		if (!fspathcmp(dup->name, ce->name))
			folded = 1;

		if (folded) {
			dup->ce_flags |= CE_MATCHED;
			break;
		}
	}
}


> 
> Another thing we maybe want to see is if we can update the caller of
> this function so that we do not overwrite the earlier checkout with
> the data for this path.  When two paths collide, we check out one of
> the paths without reporting (because we cannot notice), then attempt
> to check out the other path and report (because we do notice the
> previous one with lstat()).  The current code then goes on and overwrites
> the file with the contents from the "other" path.
> 
> Even if we had false negative in this loop, if we leave the contents
> for the earlier path while reporting the "other" path, then the user
> can get curious, inspect what contents the "other" path has on the
> filesystem, and can notice that it belongs to the (unreported--due
> to false negative) earlier path.
> 
[snip]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-16 14:03                             ` Torsten Bögershausen
@ 2018-08-16 15:42                               ` Duy Nguyen
  2018-08-16 16:23                               ` Junio C Hamano
  1 sibling, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-16 15:42 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Junio C Hamano, Jeff Hostetler, Git Mailing List, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	SZEDER Gábor

On Thu, Aug 16, 2018 at 4:03 PM Torsten Bögershausen <tboegi@web.de> wrote:
> check_stat is 0 on Windows,

How? check_stat is 1 by default. And "git init" does not add this key
on new repos.

> Now back to the compiler switch:
> Windows always set inum to 0 and I can't think about a situation where
> a file in a working tree gets inum = 0, can we use the following:

I did consider using zero inum, but dropped it when thinking about
checking all file systems. Are you sure zero inode can't be valid?
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH] config.txt: clarify core.checkStat = minimal
  2018-08-15 19:35                           ` Duy Nguyen
@ 2018-08-16 15:56                             ` Nguyễn Thái Ngọc Duy
  2018-08-16 17:01                               ` Junio C Hamano
  0 siblings, 1 reply; 96+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-16 15:56 UTC (permalink / raw)
  To: pclouds
  Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals,
	szeder.dev, tboegi

The description of this key does not really tell what 'minimal' mode
checks exactly. More information about this mode can be found in the
commit message of c08e4d5b5c (Enable minimal stat checking -
2013-01-22).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fd8d27e761..5c41314dd5 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -466,6 +466,8 @@ core.checkStat::
 	and work tree. The user can set this to 'default' or
 	'minimal'. Default (or explicitly 'default'), is to check
 	all fields, including the sub-second part of mtime and ctime.
+	'minimal' only checks size and the whole second part of mtime
+	and ctime.
 
 core.quotePath::
 	Commands that output paths (e.g. 'ls-files', 'diff'), will
-- 
2.18.0.1004.g6639190530


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
  2018-08-16 14:03                             ` Torsten Bögershausen
  2018-08-16 15:42                               ` Duy Nguyen
@ 2018-08-16 16:23                               ` Junio C Hamano
  1 sibling, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2018-08-16 16:23 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Nguyễn Thái Ngọc Duy, git, git, newren,
	pawelparuzel95, peff, sandals, SZEDER Gábor

Torsten Bögershausen <tboegi@web.de> writes:

> check_stat is 0 on Windows, and inum is allways 0 in lstat().
> I was thinking about systems which don't have inodes and inum,
> and then generate an inum in memory, sometimes random.
> After a reboot or a re-mount of the file systems those ino values
> change.
> However, for the initial clone we are fine in any case.

Yup.

> Now back to the compiler switch:
> Windows always set inum to 0 and I can't think about a situation where
> a file in a working tree gets inum = 0, can we use the following:
>
> static void mark_colliding_entries(const struct checkout *state,
> 				   struct cache_entry *ce, struct stat *st)
> {
> 	int i;
> 	ce->ce_flags |= CE_MATCHED;
>
> 	for (i = 0; i < state->istate->cache_nr; i++) {
> 		struct cache_entry *dup = state->istate->cache[i];
> 		int folded = 0;
>
> 		if (dup == ce)
> 			break;
>
> 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> 			continue;
> 		/*
> 		 * Windows sets ino to 0. On other FS ino = 0 will already be
> 		 *  used, so we don't see it for a file in a Git working tree
> 		 */
> 		if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
> 			folded = 1;

Hmm, that is tempting but feels slightly too magical to my taste.
Others may easily be able to persuade me to change my mind in this
case, though.

> 		/*
> 		 * Fallback for NTFS and other case insenstive FS,
> 		 * which don't use POSIX inums
> 		 */
> 		if (!fspathcmp(dup->name, ce->name))
> 			folded = 1;
>
> 		if (folded) {
> 			dup->ce_flags |= CE_MATCHED;
> 			break;
> 		}
> 	}
> }
>
>
>> 
>> Another thing we maybe want to see is if we can update the caller of
>> this function so that we do not overwrite the earlier checkout with
>> the data for this path.  When two paths collide, we check out one of
>> the paths without reporting (because we cannot notice), then attempt
>> to check out the other path and report (because we do notice the
>> previous one with lstat()).  The current code then goes on and overwrites
>> the file with the contents from the "other" path.
>> 
>> Even if we had false negative in this loop, if we leave the contents
>> for the earlier path while reporting the "other" path, then the user
>> can get curious, inspect what contents the "other" path has on the
>> filesystem, and can notice that it belongs to the (unreported--due
>> to false negative) earlier path.
>> 
> [snip]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] config.txt: clarify core.checkStat = minimal
  2018-08-16 15:56                             ` [PATCH] config.txt: clarify core.checkStat = minimal Nguyễn Thái Ngọc Duy
@ 2018-08-16 17:01                               ` Junio C Hamano
  2018-08-16 18:19                                 ` Duy Nguyen
  0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-16 17:01 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, git, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The description of this key does not really tell what 'minimal' mode
> checks exactly. More information about this mode can be found in the
> commit message of c08e4d5b5c (Enable minimal stat checking -
> 2013-01-22).
>

While I agree that we need to do _something_, I am not sure if this
change adds sufficient value.  I _think_ those who wonder if they
want to configure this want to know what are _not_ looked at
(relative to the "default") more than what are _still_ looked at,
partly because the description of "default" is already bogus and
says "check all fields", which is horrible for two reasons.  It is
unclear what are in "all" fields in the first place, and also we do
not look at all fields (e.g. we do not look at atime for obvious
reasons).

So perhaps

	When this configuration variable is missing or is set to
	`default`, many fields in the stat structure are checked to
	detect if a file has been modified since Git looked at it.
	Among these fields, when this configuration variable is set
	to `minimal`, sub-second part of mtime and ctime, the uid
	and gid of the owner of the file, the inode number (and the
	device number, if Git was compiled to use it), are excluded
	from the check, leaving only the whole-second part of mtime
	(and ctime, if `core.trustCtime` is set) and the filesize to
	be checked.

or something?

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/config.txt | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index fd8d27e761..5c41314dd5 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -466,6 +466,8 @@ core.checkStat::
>  	and work tree. The user can set this to 'default' or
>  	'minimal'. Default (or explicitly 'default'), is to check
>  	all fields, including the sub-second part of mtime and ctime.
> +	'minimal' only checks size and the whole second part of mtime
> +	and ctime.
>  
>  core.quotePath::
>  	Commands that output paths (e.g. 'ls-files', 'diff'), will

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] config.txt: clarify core.checkStat = minimal
  2018-08-16 17:01                               ` Junio C Hamano
@ 2018-08-16 18:19                                 ` Duy Nguyen
  2018-08-16 22:29                                   ` Junio C Hamano
  2018-08-17 15:26                                   ` Junio C Hamano
  0 siblings, 2 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-16 18:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff Hostetler, Git Mailing List, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	SZEDER Gábor, Torsten Bögershausen

On Thu, Aug 16, 2018 at 7:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
> > The description of this key does not really tell what 'minimal' mode
> > checks exactly. More information about this mode can be found in the
> > commit message of c08e4d5b5c (Enable minimal stat checking -
> > 2013-01-22).
> >
>
> While I agree that we need to do _something_, I am not sure if this
> change adds sufficient value.  I _think_ those who wonder if they
> want to configure this want to know what are _not_ looked at
> (relative to the "default") more than what are _still_ looked at,
> partly because the description of "default" is already bogus and
> says "check all fields", which is horrible for two reasons.  It is
> unclear what are in "all" fields in the first place, and also we do
> not look at all fields (e.g. we do not look at atime for obvious
> reasons).
>
> So perhaps
>
>         When this configuration variable is missing or is set to
>         `default`, many fields in the stat structure are checked to
>         detect if a file has been modified since Git looked at it.
>         Among these fields, when this configuration variable is set
>         to `minimal`, sub-second part of mtime and ctime, the uid
>         and gid of the owner of the file, the inode number (and the
>         device number, if Git was compiled to use it), are excluded
>         from the check, leaving only the whole-second part of mtime
>         (and ctime, if `core.trustCtime` is set) and the filesize to
>         be checked.
>
> or something?

Perfect. I could wrap it in a patch, but I feel you should take
authorship for that one. I'll leave it to you to create this commit.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] config.txt: clarify core.checkStat = minimal
  2018-08-16 18:19                                 ` Duy Nguyen
@ 2018-08-16 22:29                                   ` Junio C Hamano
  2018-08-17 15:26                                   ` Junio C Hamano
  1 sibling, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2018-08-16 22:29 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jeff Hostetler, Git Mailing List, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	SZEDER Gábor, Torsten Bögershausen

Duy Nguyen <pclouds@gmail.com> writes:

> On Thu, Aug 16, 2018 at 7:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>>
>> > The description of this key does not really tell what 'minimal' mode
>> > checks exactly. More information about this mode can be found in the
>> > commit message of c08e4d5b5c (Enable minimal stat checking -
>> > 2013-01-22).
>> >
>>
>> While I agree that we need to do _something_, I am not sure if this
>> change adds sufficient value.  I _think_ those who wonder if they
>> want to configure this want to know what are _not_ looked at
>> (relative to the "default") more than what are _still_ looked at,
>> partly because the description of "default" is already bogus and
>> says "check all fields", which is horrible for two reasons.  It is
>> unclear what are in "all" fields in the first place, and also we do
>> not look at all fields (e.g. we do not look at atime for obvious
>> reasons).
>>
>> So perhaps
>>
>>         When this configuration variable is missing or is set to
>>         `default`, many fields in the stat structure are checked to
>>         detect if a file has been modified since Git looked at it.
>>         Among these fields, when this configuration variable is set
>>         to `minimal`, sub-second part of mtime and ctime, the uid
>>         and gid of the owner of the file, the inode number (and the
>>         device number, if Git was compiled to use it), are excluded
>>         from the check, leaving only the whole-second part of mtime
>>         (and ctime, if `core.trustCtime` is set) and the filesize to
>>         be checked.
>>
>> or something?
>
> Perfect. I could wrap it in a patch, but I feel you should take
> authorship for that one. I'll leave it to you to create this commit.

If I find time after today's integration cycle, perhaps I can get to
it, but not until then (so the above won't be in today's pushout).

Thanks for reading it over.



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] config.txt: clarify core.checkStat = minimal
  2018-08-16 18:19                                 ` Duy Nguyen
  2018-08-16 22:29                                   ` Junio C Hamano
@ 2018-08-17 15:26                                   ` Junio C Hamano
  2018-08-17 15:29                                     ` Duy Nguyen
  1 sibling, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-17 15:26 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jeff Hostetler, Git Mailing List, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	SZEDER Gábor, Torsten Bögershausen

Duy Nguyen <pclouds@gmail.com> writes:

> Perfect. I could wrap it in a patch, but I feel you should take
> authorship for that one. I'll leave it to you to create this commit.

OK, here is what I ended up with.  An extra paragraph was taken from
the old commit you referrred to, which is probably the only
remaining part from your contribution, so the attribution has been
demoted to "Helped-by", but your initiative still is appreciated
very much.

-- >8 --
Subject: [PATCH] config.txt: clarify core.checkStat

The description of this key does not really tell what the 'minimal'
mode checks and does not check.  The description for the 'default'
mode is not much better and just says 'all fields', which is unclear
and is not even correct (e.g. we do not look at 'atime').

Spell out what are and what are not checked under the 'minimal' mode
relative to the 'default' mode to help those who want to decide if
they want to use the 'minimal' mode, also taking information about
this mode from the commit message of c08e4d5b5c (Enable minimal stat
checking - 2013-01-22).

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..933d719137 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -449,10 +449,20 @@ core.untrackedCache::
 	See linkgit:git-update-index[1]. `keep` by default.
 
 core.checkStat::
-	Determines which stat fields to match between the index
-	and work tree. The user can set this to 'default' or
-	'minimal'. Default (or explicitly 'default'), is to check
-	all fields, including the sub-second part of mtime and ctime.
+	When missing or is set to `default`, many fields in the stat
+	structure are checked to detect if a file has been modified
+	since Git looked at it.  When this configuration variable is
+	set to `minimal`, sub-second part of mtime and ctime, the
+	uid and gid of the owner of the file, the inode number (and
+	the device number, if Git was compiled to use it), are
+	excluded from the check among these fields, leaving only the
+	whole-second part of mtime (and ctime, if `core.trustCtime`
+	is set) and the filesize to be checked.
++
+There are implementations of Git that do not leave usable values in
+some fields (e.g. JGit); by excluding these fields from the
+comparison, the `minimal` mode may help interoperability when the
+same repository is used by these other systems at the same time.
 
 core.quotePath::
 	Commands that output paths (e.g. 'ls-files', 'diff'), will
-- 
2.18.0-666-g63749b2dea


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] config.txt: clarify core.checkStat = minimal
  2018-08-17 15:26                                   ` Junio C Hamano
@ 2018-08-17 15:29                                     ` Duy Nguyen
  0 siblings, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-17 15:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff Hostetler, Git Mailing List, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	SZEDER Gábor, Torsten Bögershausen

On Fri, Aug 17, 2018 at 5:26 PM Junio C Hamano <gitster@pobox.com> wrote:
> -- >8 --
> Subject: [PATCH] config.txt: clarify core.checkStat
>
> The description of this key does not really tell what the 'minimal'
> mode checks and does not check.  The description for the 'default'
> mode is not much better and just says 'all fields', which is unclear
> and is not even correct (e.g. we do not look at 'atime').
>
> Spell out what are and what are not checked under the 'minimal' mode
> relative to the 'default' mode to help those who want to decide if
> they want to use the 'minimal' mode, also taking information about
> this mode from the commit message of c08e4d5b5c (Enable minimal stat
> checking - 2013-01-22).

Looking good. This does make me want to adjust $GIT_DIR/index format
to optionally not store extra fields if we know we're not going to use
them. But that's a topic for another day.

> Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/config.txt | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index ab641bf5a9..933d719137 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -449,10 +449,20 @@ core.untrackedCache::
>         See linkgit:git-update-index[1]. `keep` by default.
>
>  core.checkStat::
> -       Determines which stat fields to match between the index
> -       and work tree. The user can set this to 'default' or
> -       'minimal'. Default (or explicitly 'default'), is to check
> -       all fields, including the sub-second part of mtime and ctime.
> +       When missing or is set to `default`, many fields in the stat
> +       structure are checked to detect if a file has been modified
> +       since Git looked at it.  When this configuration variable is
> +       set to `minimal`, sub-second part of mtime and ctime, the
> +       uid and gid of the owner of the file, the inode number (and
> +       the device number, if Git was compiled to use it), are
> +       excluded from the check among these fields, leaving only the
> +       whole-second part of mtime (and ctime, if `core.trustCtime`
> +       is set) and the filesize to be checked.
> ++
> +There are implementations of Git that do not leave usable values in
> +some fields (e.g. JGit); by excluding these fields from the
> +comparison, the `minimal` mode may help interoperability when the
> +same repository is used by these other systems at the same time.
>
>  core.quotePath::
>         Commands that output paths (e.g. 'ls-files', 'diff'), will
> --
> 2.18.0-666-g63749b2dea
>


-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
                                           ` (2 preceding siblings ...)
  2018-08-15 19:08                         ` Torsten Bögershausen
@ 2018-08-17 16:16                         ` Nguyễn Thái Ngọc Duy
  2018-08-17 17:20                           ` Junio C Hamano
                                             ` (2 more replies)
  3 siblings, 3 replies; 96+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-17 16:16 UTC (permalink / raw)
  To: pclouds
  Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals,
	szeder.dev, tboegi

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.

This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 v5 respects core.checkStat and sorts the output case-insensitively.

 I still don't trust magic st_ino zero, or core.checkStat being zero
 on Windows, so the #if condition still remains but it covers smallest
 area possible and I tested it by manually make it "#if 1"

 The fallback with fspathcmp() is only done when inode can't be
 trusted because strcmp is more expensive and when fspathcmp() learns
 more about real world in the future, it could become even more
 expensive.

 The output sorting is the result of Sublime-Gitignore repo being
 reported recently. It's not perfect but it should help seeing the
 groups in normal case.

 builtin/clone.c  |  1 +
 cache.h          |  1 +
 entry.c          | 31 +++++++++++++++++++++++++++++++
 t/t5601-clone.sh |  8 +++++++-
 unpack-trees.c   | 35 +++++++++++++++++++++++++++++++++++
 unpack-trees.h   |  1 +
 6 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
 	memset(&opts, 0, sizeof opts);
 	opts.update = 1;
 	opts.merge = 1;
+	opts.clone = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
 	unsigned force:1,
 		 quiet:1,
 		 not_new:1,
+		 clone:1,
 		 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..8766e27255 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen)
 	return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+				   struct cache_entry *ce, struct stat *st)
+{
+	int i, trust_ino = check_stat;
+
+#if defined(GIT_WINDOWS_NATIVE)
+	trust_ino = 0;
+#endif
+
+	ce->ce_flags |= CE_MATCHED;
+
+	for (i = 0; i < state->istate->cache_nr; i++) {
+		struct cache_entry *dup = state->istate->cache[i];
+
+		if (dup == ce)
+			break;
+
+		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+			continue;
+
+		if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
+		    (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+			dup->ce_flags |= CE_MATCHED;
+			break;
+		}
+	}
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce,
 			return -1;
 		}
 
+		if (state->clone)
+			mark_colliding_entries(state, ce, &st);
+
 		/*
 		 * We unlink the old file, to get the new one with the
 		 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
 			git hash-object -w -t tree --stdin) &&
 		c=$(git commit-tree -m bogus $t) &&
 		git update-ref refs/heads/bogus $c &&
-		git clone -b bogus . bogus
+		git clone -b bogus . bogus 2>warning
 	)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
+	grep X icasefs/warning &&
+	grep x icasefs/warning &&
+	test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
 partial_clone () {
 	       SERVER="$1" &&
 	       URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..4338fee3b7 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o)
 	state.refresh_cache = 1;
 	state.istate = index;
 
+	if (o->clone) {
+		state.clone = 1;
+		for (i = 0; i < index->cache_nr; i++)
+			index->cache[i]->ce_flags &= ~CE_MATCHED;
+	}
+
 	progress = get_progress(o);
 
 	if (o->update)
@@ -423,6 +429,35 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+
+	if (o->clone) {
+		struct string_list list = STRING_LIST_INIT_NODUP;
+		int i;
+
+		for (i = 0; i < index->cache_nr; i++) {
+			struct cache_entry *ce = index->cache[i];
+
+			if (!(ce->ce_flags & CE_MATCHED))
+				continue;
+
+			string_list_append(&list, ce->name);
+			ce->ce_flags &= ~CE_MATCHED;
+		}
+
+		list.cmp = fspathcmp;
+		string_list_sort(&list);
+
+		if (list.nr)
+			warning(_("the following paths have collided (e.g. case-sensitive paths\n"
+				  "on a case-insensitive filesystem) and only one from the same\n"
+				  "colliding group is in the working tree:\n"));
+
+		for (i = 0; i < list.nr; i++)
+			fprintf(stderr, "  '%s'\n", list.items[i].string);
+
+		string_list_clear(&list, 0);
+	}
+
 	return errs != 0;
 }
 
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..d940f1c5c2 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -42,6 +42,7 @@ struct unpack_trees_options {
 	unsigned int reset,
 		     merge,
 		     update,
+		     clone,
 		     index_only,
 		     nontrivial_merge,
 		     trivial_merges_only,
-- 
2.18.0.1004.g6639190530


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-08-17 16:16                         ` [PATCH v5] " Nguyễn Thái Ngọc Duy
@ 2018-08-17 17:20                           ` Junio C Hamano
  2018-08-17 18:00                             ` Duy Nguyen
  2018-08-17 19:46                           ` Torsten Bögershausen
  2018-11-19  8:20                           ` Carlo Marcelo Arenas Belón
  2 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-08-17 17:20 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, git, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

>  I still don't trust magic st_ino zero, or core.checkStat being zero
>  on Windows, so the #if condition still remains but it covers smallest
>  area possible and I tested it by manually make it "#if 1"
>
>  The fallback with fspathcmp() is only done when inode can't be
>  trusted because strcmp is more expensive and when fspathcmp() learns
>  more about real world in the future, it could become even more
>  expensive.
>
>  The output sorting is the result of Sublime-Gitignore repo being
>  reported recently. It's not perfect but it should help seeing the
>  groups in normal case.

Looks small and safe.

> +
> +	if (o->clone) {
> +		struct string_list list = STRING_LIST_INIT_NODUP;
> +		int i;
> +
> +		for (i = 0; i < index->cache_nr; i++) {
> +			struct cache_entry *ce = index->cache[i];
> +
> +			if (!(ce->ce_flags & CE_MATCHED))
> +				continue;
> +
> +			string_list_append(&list, ce->name);
> +			ce->ce_flags &= ~CE_MATCHED;
> +		}
> +
> +		list.cmp = fspathcmp;
> +		string_list_sort(&list);
> +
> +		if (list.nr)
> +			warning(_("the following paths have collided (e.g. case-sensitive paths\n"
> +				  "on a case-insensitive filesystem) and only one from the same\n"
> +				  "colliding group is in the working tree:\n"));
> +
> +		for (i = 0; i < list.nr; i++)
> +			fprintf(stderr, "  '%s'\n", list.items[i].string);
> +
> +		string_list_clear(&list, 0);

I would have written the "sort, show warning, and list" all inside
"if (list.nr)" block, leaving list-clear outside, which would have
made the logic a bit cleaner.  The reader does not have to bother
thinking "ah, when list.nr==0, this is a no-op anyway" to skip them
if written that way.

I highly suspect that the above was written in that way to reduce
the indentation level, but the right way to reduce the indentation
level, if it bothers readers too much, is to make the whole thing
inside the above if (o->clone) into a dedicated helper function
"void report_collided_checkout(void)", I would think.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-08-17 17:20                           ` Junio C Hamano
@ 2018-08-17 18:00                             ` Duy Nguyen
  0 siblings, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-08-17 18:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, git, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi

On Fri, Aug 17, 2018 at 10:20:36AM -0700, Junio C Hamano wrote:
> I highly suspect that the above was written in that way to reduce
> the indentation level, but the right way to reduce the indentation
> level, if it bothers readers too much, is to make the whole thing
> inside the above if (o->clone) into a dedicated helper function
> "void report_collided_checkout(void)", I would think.

I read my mind. I thought of separating into a helper function too,
but was not happy that the clearing CE_MATCHED in preparation for this
test is in check_updates(), but the cleaning up CE_MATCHED() is in the
helper function.

So here is the version that separates _both_ phases into helper
functions.

-- 8< --
Subject: [PATCH v6] clone: report duplicate entries on case-insensitive filesystems

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.

This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c  |  1 +
 cache.h          |  1 +
 entry.c          | 31 +++++++++++++++++++++++++++++++
 t/t5601-clone.sh |  8 +++++++-
 unpack-trees.c   | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 unpack-trees.h   |  1 +
 6 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
 	memset(&opts, 0, sizeof opts);
 	opts.update = 1;
 	opts.merge = 1;
+	opts.clone = 1;
 	opts.fn = oneway_merge;
 	opts.verbose_update = (option_verbosity >= 0);
 	opts.src_index = &the_index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
 	unsigned force:1,
 		 quiet:1,
 		 not_new:1,
+		 clone:1,
 		 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..8766e27255 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen)
 	return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+				   struct cache_entry *ce, struct stat *st)
+{
+	int i, trust_ino = check_stat;
+
+#if defined(GIT_WINDOWS_NATIVE)
+	trust_ino = 0;
+#endif
+
+	ce->ce_flags |= CE_MATCHED;
+
+	for (i = 0; i < state->istate->cache_nr; i++) {
+		struct cache_entry *dup = state->istate->cache[i];
+
+		if (dup == ce)
+			break;
+
+		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+			continue;
+
+		if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
+		    (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+			dup->ce_flags |= CE_MATCHED;
+			break;
+		}
+	}
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce,
 			return -1;
 		}
 
+		if (state->clone)
+			mark_colliding_entries(state, ce, &st);
+
 		/*
 		 * We unlink the old file, to get the new one with the
 		 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
 			git hash-object -w -t tree --stdin) &&
 		c=$(git commit-tree -m bogus $t) &&
 		git update-ref refs/heads/bogus $c &&
-		git clone -b bogus . bogus
+		git clone -b bogus . bogus 2>warning
 	)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
+	grep X icasefs/warning &&
+	grep x icasefs/warning &&
+	test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
 partial_clone () {
 	       SERVER="$1" &&
 	       URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..213da8bbb4 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -345,6 +345,46 @@ static struct progress *get_progress(struct unpack_trees_options *o)
 	return start_delayed_progress(_("Checking out files"), total);
 }
 
+static void setup_collided_checkout_detection(struct checkout *state,
+					      struct index_state *index)
+{
+	int i;
+
+	state->clone = 1;
+	for (i = 0; i < index->cache_nr; i++)
+		index->cache[i]->ce_flags &= ~CE_MATCHED;
+}
+
+static void report_collided_checkout(struct index_state *index)
+{
+	struct string_list list = STRING_LIST_INIT_NODUP;
+	int i;
+
+	for (i = 0; i < index->cache_nr; i++) {
+		struct cache_entry *ce = index->cache[i];
+
+		if (!(ce->ce_flags & CE_MATCHED))
+			continue;
+
+		string_list_append(&list, ce->name);
+		ce->ce_flags &= ~CE_MATCHED;
+	}
+
+	list.cmp = fspathcmp;
+	string_list_sort(&list);
+
+	if (list.nr) {
+		warning(_("the following paths have collided (e.g. case-sensitive paths\n"
+			  "on a case-insensitive filesystem) and only one from the same\n"
+			  "colliding group is in the working tree:\n"));
+
+		for (i = 0; i < list.nr; i++)
+			fprintf(stderr, "  '%s'\n", list.items[i].string);
+	}
+
+	string_list_clear(&list, 0);
+}
+
 static int check_updates(struct unpack_trees_options *o)
 {
 	unsigned cnt = 0;
@@ -359,6 +399,9 @@ static int check_updates(struct unpack_trees_options *o)
 	state.refresh_cache = 1;
 	state.istate = index;
 
+	if (o->clone)
+		setup_collided_checkout_detection(&state, index);
+
 	progress = get_progress(o);
 
 	if (o->update)
@@ -423,6 +466,10 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+
+	if (o->clone)
+		report_collided_checkout(index);
+
 	return errs != 0;
 }
 
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..d940f1c5c2 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -42,6 +42,7 @@ struct unpack_trees_options {
 	unsigned int reset,
 		     merge,
 		     update,
+		     clone,
 		     index_only,
 		     nontrivial_merge,
 		     trivial_merges_only,
-- 
2.18.0.1004.g6639190530

-- 8< --
--
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-08-17 16:16                         ` [PATCH v5] " Nguyễn Thái Ngọc Duy
  2018-08-17 17:20                           ` Junio C Hamano
@ 2018-08-17 19:46                           ` Torsten Bögershausen
  2018-11-19  8:20                           ` Carlo Marcelo Arenas Belón
  2 siblings, 0 replies; 96+ messages in thread
From: Torsten Bögershausen @ 2018-08-17 19:46 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev

On Fri, Aug 17, 2018 at 06:16:45PM +0200, Nguyễn Thái Ngọc Duy wrote:

The whole patch looks good to me.
(I was just sending a different version, but your version is better :-)

One minor remark, should the line
warning: the following paths have collided 
start with a capital letter:
Warning: the following paths have collided 

> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> In the case that we can't rely on filesystem (via inode number) to do
> this check, fall back to fspathcmp() which is not perfect but should
> not give false positives.
> 
> This patch is tested with vim-colorschemes and Sublime-Gitignore
> repositories on a JFS partition with case insensitive support on
> Linux.

Now even tested under Mac OS/HFS+

[]
>  '
>  
> +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '

My ambition is to run the test under Windows (both CYGWIN and native) next week,
so that we can remove !MINGW and !CYGWIN 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-08-17 16:16                         ` [PATCH v5] " Nguyễn Thái Ngọc Duy
  2018-08-17 17:20                           ` Junio C Hamano
  2018-08-17 19:46                           ` Torsten Bögershausen
@ 2018-11-19  8:20                           ` Carlo Marcelo Arenas Belón
  2018-11-19 12:28                             ` Torsten Bögershausen
  2 siblings, 1 reply; 96+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2018-11-19  8:20 UTC (permalink / raw)
  To: git
  Cc: pclouds, git, gitster, newren, pawelparuzel95, peff, sandals,
	szeder.dev, tboegi

While I don't have an HFS+ volume to test, I suspect this patch should be
needed for both, even if I have to say thay even the broken output was
better than the current state.

Travis seems to be using a case sensitive filesystem so wouldn't catch this.

Was windows/cygwin tested?

Carlo
-- >8 --
Subject: [PATCH] entry: fix t5061 on macOS

b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
2018-08-17) was tested on Linux with an excemption for Windows that needs
to be expanded for macOS (using APFS), which then would show :

$ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'man2/_Exit.2'
  'man2/_exit.2'
  'man3/NAN.3'
  'man3/nan.3'

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 entry.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/entry.c b/entry.c
index 5d136c5d55..3845f570f7 100644
--- a/entry.c
+++ b/entry.c
@@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state,
 {
 	int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE)
+#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
 	trust_ino = 0;
 #endif
 
-- 
2.20.0.rc0


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19  8:20                           ` Carlo Marcelo Arenas Belón
@ 2018-11-19 12:28                             ` Torsten Bögershausen
  2018-11-19 17:14                               ` Carlo Arenas
  2018-11-19 17:21                               ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones
  0 siblings, 2 replies; 96+ messages in thread
From: Torsten Bögershausen @ 2018-11-19 12:28 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón, git
  Cc: pclouds, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev

On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote:
> While I don't have an HFS+ volume to test, I suspect this patch should be
> needed for both, even if I have to say thay even the broken output was
> better than the current state.
> 
> Travis seems to be using a case sensitive filesystem so wouldn't catch this.
> 
> Was windows/cygwin tested?
> 
> Carlo
> -- >8 --
> Subject: [PATCH] entry: fix t5061 on macOS
> 
> b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
> 2018-08-17) was tested on Linux with an excemption for Windows that needs
> to be expanded for macOS (using APFS), which then would show :
> 
> $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
> 
>   'man2/_Exit.2'
>   'man2/_exit.2'
>   'man3/NAN.3'
>   'man3/nan.3'
> 
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  entry.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..3845f570f7 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state,
>  {
>  	int i, trust_ino = check_stat;
>  
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
>  	trust_ino = 0;
>  #endif
>  
> 

Sorry,
but I can't reproduce your problem here.

Did you test it on Mac ?
If I run t5601 on a case sensitive files system
(Mac, mounted NFS, exported from Linux)
I get:
ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of
!MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

And if I run it on a case-insensitive HFS+,
I get
ok 99 - colliding file detection

So what exactly are you trying to fix ?


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 12:28                             ` Torsten Bögershausen
@ 2018-11-19 17:14                               ` Carlo Arenas
  2018-11-19 18:24                                 ` Duy Nguyen
  2018-11-19 17:21                               ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones
  1 sibling, 1 reply; 96+ messages in thread
From: Carlo Arenas @ 2018-11-19 17:14 UTC (permalink / raw)
  To: tboegi
  Cc: git, pclouds, git, gitster, newren, pawelparuzel95, peff,
	sandals, szeder.dev

On Mon, Nov 19, 2018 at 4:28 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> Did you test it on Mac ?

macOS 10.14.1 but only using APFS, did you test my patch with HFS+?

> So what exactly are you trying to fix ?

I get

not ok 99 - colliding file detection
#
# grep X icasefs/warning &&
# grep x icasefs/warning &&
# test_i18ngrep "the following paths have collided" icasefs/warning
#

and the output of "warning" only shows one of the conflicting files,
instead of both:

Cloning into 'bogus'...
done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'x'

Carlo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 12:28                             ` Torsten Bögershausen
  2018-11-19 17:14                               ` Carlo Arenas
@ 2018-11-19 17:21                               ` Ramsay Jones
  2018-11-19 19:39                                 ` Carlo Arenas
  1 sibling, 1 reply; 96+ messages in thread
From: Ramsay Jones @ 2018-11-19 17:21 UTC (permalink / raw)
  To: Torsten Bögershausen, Carlo Marcelo Arenas Belón, git
  Cc: pclouds, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev



On 19/11/2018 12:28, Torsten Bögershausen wrote:
> On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote:
>> While I don't have an HFS+ volume to test, I suspect this patch should be
>> needed for both, even if I have to say thay even the broken output was
>> better than the current state.
>>
>> Travis seems to be using a case sensitive filesystem so wouldn't catch this.
>>
>> Was windows/cygwin tested?
>>
>> Carlo
>> -- >8 --
>> Subject: [PATCH] entry: fix t5061 on macOS
>>
>> b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
>> 2018-08-17) was tested on Linux with an excemption for Windows that needs
>> to be expanded for macOS (using APFS), which then would show :
>>
>> $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
>> warning: the following paths have collided (e.g. case-sensitive paths
>> on a case-insensitive filesystem) and only one from the same
>> colliding group is in the working tree:
>>
>>   'man2/_Exit.2'
>>   'man2/_exit.2'
>>   'man3/NAN.3'
>>   'man3/nan.3'
>>
>> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>> ---
>>  entry.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/entry.c b/entry.c
>> index 5d136c5d55..3845f570f7 100644
>> --- a/entry.c
>> +++ b/entry.c
>> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state,
>>  {
>>  	int i, trust_ino = check_stat;
>>  
>> -#if defined(GIT_WINDOWS_NATIVE)
>> +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
>>  	trust_ino = 0;
>>  #endif
>>  
>>
> 
> Sorry,
> but I can't reproduce your problem here.
> 
> Did you test it on Mac ?
> If I run t5601 on a case sensitive files system
> (Mac, mounted NFS, exported from Linux)
> I get:
> ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of
> !MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

I tested v2.20.0-rc0 on cygwin last night and it passed just fine.
I just ran t5601-clone.sh on its own and got:

    $ ./t5601-clone.sh
    ...
    ok 98 - clone on case-insensitive fs
    ok 99 # skip colliding file detection (missing !CYGWIN of !MINGW,!CYGWIN,CASE_INSENSITIVE_FS)
    ok 100 - partial clone
    ok 101 - partial clone: warn if server does not support object filtering
    ok 102 - batch missing blob request during checkout
    ok 103 - batch missing blob request does not inadvertently try to fetch gitlinks
    # passed all 103 test(s)
    # SKIP no web server found at '/usr/sbin/apache2'
    1..103
    $ 

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 17:14                               ` Carlo Arenas
@ 2018-11-19 18:24                                 ` Duy Nguyen
  2018-11-19 21:03                                   ` Duy Nguyen
  0 siblings, 1 reply; 96+ messages in thread
From: Duy Nguyen @ 2018-11-19 18:24 UTC (permalink / raw)
  To: Carlo Arenas
  Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler,
	Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King,
	brian m. carlson, SZEDER Gábor

On Mon, Nov 19, 2018 at 6:14 PM Carlo Arenas <carenas@gmail.com> wrote:
>
> On Mon, Nov 19, 2018 at 4:28 AM Torsten Bögershausen <tboegi@web.de> wrote:
> >
> > Did you test it on Mac ?
>
> macOS 10.14.1 but only using APFS, did you test my patch with HFS+?
>
> > So what exactly are you trying to fix ?
>
> I get
>
> not ok 99 - colliding file detection
> #
> # grep X icasefs/warning &&
> # grep x icasefs/warning &&
> # test_i18ngrep "the following paths have collided" icasefs/warning
> #
>
> and the output of "warning" only shows one of the conflicting files,
> instead of both:
>
> Cloning into 'bogus'...
> done.
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
>
>   'x'
>
> Carlo

Could you send me the "index" file in  t/trash\
directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
"stat /path/to/icase/bogus/x"

My only explanation is somehow the inode value we save is not the same
one on disk, which is weird and could even cause other problems. I'd
like to know why this happens before trying to fix anything.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 17:21                               ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones
@ 2018-11-19 19:39                                 ` Carlo Arenas
  0 siblings, 0 replies; 96+ messages in thread
From: Carlo Arenas @ 2018-11-19 19:39 UTC (permalink / raw)
  To: ramsay
  Cc: tboegi, git, pclouds, git, gitster, newren, pawelparuzel95, peff,
	sandals, szeder.dev

On Mon, Nov 19, 2018 at 9:23 AM Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>     ok 99 # skip colliding file detection (missing !CYGWIN of !MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

you need to enable this specific test first (removing !CYGWIN) so it
doesn't get skipped

Carlo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 18:24                                 ` Duy Nguyen
@ 2018-11-19 21:03                                   ` Duy Nguyen
  2018-11-19 21:04                                     ` Duy Nguyen
                                                       ` (3 more replies)
  0 siblings, 4 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-11-19 21:03 UTC (permalink / raw)
  To: Carlo Arenas
  Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler,
	Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King,
	brian m. carlson, SZEDER Gábor

First of all, Ramsay, it would be great if you could test the below
patch and see if it works on Cygwin. I assume since Cygwin shares the
underlying filesystem, it will share the same "no trusting inode"
issue with native builds (or it calculates inodes anyway using some
other source?).

Back to the APFS problem...

On Mon, Nov 19, 2018 at 07:24:26PM +0100, Duy Nguyen wrote:
> Could you send me the "index" file in  t/trash\
> directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
> "stat /path/to/icase/bogus/x"
> 
> My only explanation is somehow the inode value we save is not the same
> one on disk, which is weird and could even cause other problems. I'd
> like to know why this happens before trying to fix anything.

Thanks Carlo for the file and "stat" output. The problem is APFS has
64-bit inode (according to the Internet) while we store inodes as
32-bit, so it's truncated. Which means this comparison

    sd_ino == st_ino

is never true because sd_ino is truncated (0x2121063) while st_ino is
not (0x202121063).

Carlo, it would be great if you could test this patch also with
APFS. It should fix problem. We will have to deal with the same
truncated inode elsewhere to make sure we index refresh performance
does not degrade on APFS. But that's a separate problem. Thank you for
bringing this up.

-- 8< --
diff --git a/entry.c b/entry.c
index 5d136c5d55..809d3e2ba7 100644
--- a/entry.c
+++ b/entry.c
@@ -404,13 +404,13 @@ static void mark_colliding_entries(const struct checkout *state,
 {
 	int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE)
+#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
 	trust_ino = 0;
 #endif
 
 	ce->ce_flags |= CE_MATCHED;
 
-	for (i = 0; i < state->istate->cache_nr; i++) {
+	for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
 		if (dup == ce)
@@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state,
 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
 			continue;
 
-		if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
-		    (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+		if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {
 			dup->ce_flags |= CE_MATCHED;
+			return;
+		}
+	}
+
+	for (i = 0; i < state->istate->cache_nr; i++) {
+		struct cache_entry *dup = state->istate->cache[i];
+
+		if (dup == ce)
 			break;
+
+		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+			continue;
+
+		if (!fspathcmp(ce->name, dup->name)) {
+			dup->ce_flags |= CE_MATCHED;
+			return;
 		}
 	}
 }
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index f1a49e94f5..c28d51bd59 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
 	)
 '
 
-test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
+test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
 	grep X icasefs/warning &&
 	grep x icasefs/warning &&
 	test_i18ngrep "the following paths have collided" icasefs/warning
-- 8< --

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 21:03                                   ` Duy Nguyen
@ 2018-11-19 21:04                                     ` Duy Nguyen
  2018-11-19 21:17                                     ` Duy Nguyen
                                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-11-19 21:04 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: Carlo Arenas, Torsten Bögershausen, Git Mailing List,
	Jeff Hostetler, Junio C Hamano, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	SZEDER Gábor

... and I "dear Ramsay" without CCing him.. sigh.. sorry for the noise.

On Mon, Nov 19, 2018 at 10:03 PM Duy Nguyen <pclouds@gmail.com> wrote:
>
> First of all, Ramsay, it would be great if you could test the below
> patch and see if it works on Cygwin. I assume since Cygwin shares the
> underlying filesystem, it will share the same "no trusting inode"
> issue with native builds (or it calculates inodes anyway using some
> other source?).
>
> Back to the APFS problem...
>
> On Mon, Nov 19, 2018 at 07:24:26PM +0100, Duy Nguyen wrote:
> > Could you send me the "index" file in  t/trash\
> > directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
> > "stat /path/to/icase/bogus/x"
> >
> > My only explanation is somehow the inode value we save is not the same
> > one on disk, which is weird and could even cause other problems. I'd
> > like to know why this happens before trying to fix anything.
>
> Thanks Carlo for the file and "stat" output. The problem is APFS has
> 64-bit inode (according to the Internet) while we store inodes as
> 32-bit, so it's truncated. Which means this comparison
>
>     sd_ino == st_ino
>
> is never true because sd_ino is truncated (0x2121063) while st_ino is
> not (0x202121063).
>
> Carlo, it would be great if you could test this patch also with
> APFS. It should fix problem. We will have to deal with the same
> truncated inode elsewhere to make sure we index refresh performance
> does not degrade on APFS. But that's a separate problem. Thank you for
> bringing this up.
>
> -- 8< --
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..809d3e2ba7 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,13 +404,13 @@ static void mark_colliding_entries(const struct checkout *state,
>  {
>         int i, trust_ino = check_stat;
>
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
>         trust_ino = 0;
>  #endif
>
>         ce->ce_flags |= CE_MATCHED;
>
> -       for (i = 0; i < state->istate->cache_nr; i++) {
> +       for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {
>                 struct cache_entry *dup = state->istate->cache[i];
>
>                 if (dup == ce)
> @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state,
>                 if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>                         continue;
>
> -               if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
> -                   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
> +               if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {
>                         dup->ce_flags |= CE_MATCHED;
> +                       return;
> +               }
> +       }
> +
> +       for (i = 0; i < state->istate->cache_nr; i++) {
> +               struct cache_entry *dup = state->istate->cache[i];
> +
> +               if (dup == ce)
>                         break;
> +
> +               if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> +                       continue;
> +
> +               if (!fspathcmp(ce->name, dup->name)) {
> +                       dup->ce_flags |= CE_MATCHED;
> +                       return;
>                 }
>         }
>  }
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index f1a49e94f5..c28d51bd59 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
>         )
>  '
>
> -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
> +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
>         grep X icasefs/warning &&
>         grep x icasefs/warning &&
>         test_i18ngrep "the following paths have collided" icasefs/warning
> -- 8< --



-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 21:03                                   ` Duy Nguyen
  2018-11-19 21:04                                     ` Duy Nguyen
@ 2018-11-19 21:17                                     ` Duy Nguyen
  2018-11-19 23:29                                     ` Ramsay Jones
  2018-11-20  2:22                                     ` Junio C Hamano
  3 siblings, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-11-19 21:17 UTC (permalink / raw)
  To: Carlo Arenas
  Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler,
	Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King,
	brian m. carlson, SZEDER Gábor

On Mon, Nov 19, 2018 at 10:03 PM Duy Nguyen <pclouds@gmail.com> wrote:
> Thanks Carlo for the file and "stat" output. The problem is APFS has
> 64-bit inode (according to the Internet) while we store inodes as
> 32-bit, so it's truncated.
> ...
> We will have to deal with the same
> truncated inode elsewhere to make sure we index refresh performance
> does not degrade on APFS.

... and we don't have a problem there. Either Linus predicted dealing
with 64-bit inodes, or he had a habit of casting st_ino to unsigned
int, I cannot tell. This code

    ce->st_ino != (unsigned int)st->st_ino

is from e83c516331 (Initial revision of "git", the information manager
from hell - 2005-04-07) and it's still used today for comparing sd_ino
with st->st_ino in read-cache.c. I guess I should have copied and
pasted more often.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 21:03                                   ` Duy Nguyen
  2018-11-19 21:04                                     ` Duy Nguyen
  2018-11-19 21:17                                     ` Duy Nguyen
@ 2018-11-19 23:29                                     ` Ramsay Jones
  2018-11-19 23:54                                       ` Ramsay Jones
  2018-11-20  2:22                                     ` Junio C Hamano
  3 siblings, 1 reply; 96+ messages in thread
From: Ramsay Jones @ 2018-11-19 23:29 UTC (permalink / raw)
  To: Duy Nguyen, Carlo Arenas
  Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler,
	Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King,
	brian m. carlson, SZEDER Gábor



On 19/11/2018 21:03, Duy Nguyen wrote:
> First of all, Ramsay, it would be great if you could test the below
> patch and see if it works on Cygwin. I assume since Cygwin shares the
> underlying filesystem, it will share the same "no trusting inode"
> issue with native builds (or it calculates inodes anyway using some
> other source?).

Hmm, I have no idea why you would like me to try this patch - care
to explain? [I just saw, "Has this been tested on cygwin?" and, since
it has been happily passing for some time, responded yes!]

Just for the giggles, I removed the !CYGWIN prerequisite from the
test and when, as expected, the test failed, had a look around:

$ pwd
/home/ramsay/git/t/trash directory.t5601-clone
$ cat icasefs/warning 
Cloning into 'bogus'...
done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'x'
$ cd icasefs/bogus
$ ls -l
total 0
-rw-r--r-- 1 ramsay None 0 Nov 19 22:40 x
$ git ls-files --debug
ignoring EOIE extension
X
  ctime: 1542667201:664036600
  mtime: 1542667201:663055400
  dev: 2378432	ino: 324352
  uid: 1001	gid: 513
  size: 0	flags: 0
x
  ctime: 1542667201:665026800
  mtime: 1542667201:665026800
  dev: 2378432	ino: 324352
  uid: 1001	gid: 513
  size: 0	flags: 0
$ 

So, both X and x are in the index with the same inode number.

Does that help?

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 23:29                                     ` Ramsay Jones
@ 2018-11-19 23:54                                       ` Ramsay Jones
  2018-11-20  1:05                                         ` Carlo Arenas
  0 siblings, 1 reply; 96+ messages in thread
From: Ramsay Jones @ 2018-11-19 23:54 UTC (permalink / raw)
  To: Duy Nguyen, Carlo Arenas
  Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler,
	Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King,
	brian m. carlson, SZEDER Gábor



On 19/11/2018 23:29, Ramsay Jones wrote:
> 
> 
> On 19/11/2018 21:03, Duy Nguyen wrote:
>> First of all, Ramsay, it would be great if you could test the below
>> patch and see if it works on Cygwin. I assume since Cygwin shares the
>> underlying filesystem, it will share the same "no trusting inode"
>> issue with native builds (or it calculates inodes anyway using some
>> other source?).
> 
> Hmm, I have no idea why you would like me to try this patch - care
> to explain? [I just saw, "Has this been tested on cygwin?" and, since
> it has been happily passing for some time, responded yes!]
> 
> Just for the giggles, I removed the !CYGWIN prerequisite from the
> test and when, as expected, the test failed, had a look around:
> 
> $ pwd
> /home/ramsay/git/t/trash directory.t5601-clone
> $ cat icasefs/warning 
> Cloning into 'bogus'...
> done.
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
> 
>   'x'
> $ cd icasefs/bogus
> $ ls -l
> total 0
> -rw-r--r-- 1 ramsay None 0 Nov 19 22:40 x
> $ git ls-files --debug
> ignoring EOIE extension
> X
>   ctime: 1542667201:664036600
>   mtime: 1542667201:663055400
>   dev: 2378432	ino: 324352
>   uid: 1001	gid: 513
>   size: 0	flags: 0
> x
>   ctime: 1542667201:665026800
>   mtime: 1542667201:665026800
>   dev: 2378432	ino: 324352
>   uid: 1001	gid: 513
>   size: 0	flags: 0
> $ 
> 
> So, both X and x are in the index with the same inode number.
> 
> Does that help?

Well, I haven't even looked at the patch, but when I apply it to
the current 'pu' branch (just what I happened to have checked out)
and run that one test:

$ ./t5601-clone.sh
...
ok 96 - shallow clone locally
ok 97 - GIT_TRACE_PACKFILE produces a usable pack
ok 98 - clone on case-insensitive fs
ok 99 - colliding file detection
ok 100 - partial clone
ok 101 - partial clone: warn if server does not support object filtering
ok 102 - batch missing blob request during checkout
ok 103 - batch missing blob request does not inadvertently try to fetch gitlinks
# passed all 103 test(s)
# SKIP no web server found at '/usr/sbin/apache2'
1..103
$ 

... the colliding file detection test passes!

ATB,
Ramsay Jones



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 23:54                                       ` Ramsay Jones
@ 2018-11-20  1:05                                         ` Carlo Arenas
  0 siblings, 0 replies; 96+ messages in thread
From: Carlo Arenas @ 2018-11-20  1:05 UTC (permalink / raw)
  To: ramsay
  Cc: pclouds, tboegi, git, git, gitster, newren, pawelparuzel95, peff,
	sandals, szeder.dev

ok 99 - colliding file detection

as well in macOS with APFS

Carlo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
  2018-11-19 21:03                                   ` Duy Nguyen
                                                       ` (2 preceding siblings ...)
  2018-11-19 23:29                                     ` Ramsay Jones
@ 2018-11-20  2:22                                     ` Junio C Hamano
  2018-11-20 16:28                                       ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy
  3 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2018-11-20  2:22 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Carlo Arenas, Torsten Bögershausen, Git Mailing List,
	Jeff Hostetler, Elijah Newren, Paweł Paruzel, Jeff King,
	brian m. carlson, SZEDER Gábor

Duy Nguyen <pclouds@gmail.com> writes:

>  
> -	for (i = 0; i < state->istate->cache_nr; i++) {
> +	for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {

There is some typo here, but modulo that this looks like the right
thing to do.

> @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state,
>  		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>  			continue;
>  
> -		if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
> -		    (!trust_ino && !fspathcmp(ce->name, dup->name))) {
> +		if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {

This is slightly unfortunate but is the best we can do for now.  

The reason why the design of the "cached stat info" mechanism allows
the sd_* fields to be narrower than the underlying fields is because
they are used only as an early-culling measure (if the value saved
with truncation is different from the current value with truncation,
then they cannot possibly be the same, so we know that the file
changed without looking at the contents).

This use however is different.  Equality of truncated values
immediately declare CE_MATCHED here, producing false negative, which
is not what we want, no?

>  			dup->ce_flags |= CE_MATCHED;
> +			return;
> +		}
> +	}


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH] clone: fix colliding file detection on APFS
  2018-11-20  2:22                                     ` Junio C Hamano
@ 2018-11-20 16:28                                       ` Nguyễn Thái Ngọc Duy
  2018-11-20 19:20                                         ` Ramsay Jones
                                                           ` (2 more replies)
  0 siblings, 3 replies; 96+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-11-20 16:28 UTC (permalink / raw)
  To: gitster
  Cc: carenas, git, git, newren, pawelparuzel95, pclouds, peff,
	sandals, szeder.dev, tboegi, Ramsay Jones

Commit b878579ae7 (clone: report duplicate entries on case-insensitive
filesystems - 2018-08-17) adds a warning to user when cloning a repo
with case-sensitive file names on a case-insensitive file system. The
"find duplicate file" check was doing by comparing inode number (and
only fall back to fspathcmp() when inode is known to be unreliable
because fspathcmp() can't cover all case folding cases).

The inode check is very simple, and wrong. It compares between a
32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When
an inode is larger than 2^32 (which seems to be the case for APFS), it
will be truncated and stored in sd_ino, but comparing with itself will
fail.

As a result, instead of showing a pair of files that have the same
name, we show just one file (marked before the beginning of the
loop). We fail to find the original one.

The fix could be just a simple type cast (*)

    dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino

but this is no longer a reliable test, there are 4G possible inodes
that can match sd_ino because we only match the lower 32 bits instead
of full 64 bits.

There are two options to go. Either we ignore inode and go with
fspathcmp() on Apple platform. This means we can't do accurate inode
check on HFS anymore, or even on APFS when inode numbers are still
below 2^32.

Or we just to to reduce the odds of matching a wrong file by checking
more attributes, counting mostly on st_size because st_xtime is likely
the same. This patch goes with this direction, hoping that false
positive chances are too small to be seen in practice.

While at there, enable the test on Cygwin (verified working by Ramsay
Jones)

(*) this is also already done inside match_stat_data()

Reported-by: Carlo Arenas <carenas@gmail.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 So I'm going with match_stat_data(). But I don't know, perhaps just
 ignoring inode (like Carlo's original patch) is safer/better?

 Tested on case-insensitive JFS on Linux. But I don't think it really
 matters because I'm not even sure if I could push inode above 2^32
 with this. Hacking JFS for this test sounds fun, but no time for that.

 entry.c          | 4 ++--
 t/t5601-clone.sh | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/entry.c b/entry.c
index 5d136c5d55..0a3c451f5f 100644
--- a/entry.c
+++ b/entry.c
@@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state,
 {
 	int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE)
+#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
 	trust_ino = 0;
 #endif
 
@@ -419,7 +419,7 @@ static void mark_colliding_entries(const struct checkout *state,
 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
 			continue;
 
-		if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
+		if ((trust_ino && !match_stat_data(&dup->ce_stat_data, st)) ||
 		    (!trust_ino && !fspathcmp(ce->name, dup->name))) {
 			dup->ce_flags |= CE_MATCHED;
 			break;
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index f1a49e94f5..c28d51bd59 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
 	)
 '
 
-test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
+test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
 	grep X icasefs/warning &&
 	grep x icasefs/warning &&
 	test_i18ngrep "the following paths have collided" icasefs/warning
-- 
2.19.1.1327.g328c130451.dirty


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] clone: fix colliding file detection on APFS
  2018-11-20 16:28                                       ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy
@ 2018-11-20 19:20                                         ` Ramsay Jones
  2018-11-20 19:35                                         ` Carlo Arenas
  2018-11-22 17:59                                         ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi
  2 siblings, 0 replies; 96+ messages in thread
From: Ramsay Jones @ 2018-11-20 19:20 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, gitster
  Cc: carenas, git, git, newren, pawelparuzel95, peff, sandals,
	szeder.dev, tboegi



On 20/11/2018 16:28, Nguyễn Thái Ngọc Duy wrote:
> Commit b878579ae7 (clone: report duplicate entries on case-insensitive
> filesystems - 2018-08-17) adds a warning to user when cloning a repo
> with case-sensitive file names on a case-insensitive file system. The
> "find duplicate file" check was doing by comparing inode number (and
> only fall back to fspathcmp() when inode is known to be unreliable
> because fspathcmp() can't cover all case folding cases).
> 
> The inode check is very simple, and wrong. It compares between a
> 32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When
> an inode is larger than 2^32 (which seems to be the case for APFS), it
> will be truncated and stored in sd_ino, but comparing with itself will
> fail.
> 
> As a result, instead of showing a pair of files that have the same
> name, we show just one file (marked before the beginning of the
> loop). We fail to find the original one.
> 
> The fix could be just a simple type cast (*)
> 
>     dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino
> 
> but this is no longer a reliable test, there are 4G possible inodes
> that can match sd_ino because we only match the lower 32 bits instead
> of full 64 bits.
> 
> There are two options to go. Either we ignore inode and go with
> fspathcmp() on Apple platform. This means we can't do accurate inode
> check on HFS anymore, or even on APFS when inode numbers are still
> below 2^32.
> 
> Or we just to to reduce the odds of matching a wrong file by checking
> more attributes, counting mostly on st_size because st_xtime is likely
> the same. This patch goes with this direction, hoping that false
> positive chances are too small to be seen in practice.
> 
> While at there, enable the test on Cygwin (verified working by Ramsay
> Jones)

Well, no, I tested the previous version of this patch. However, this
patch also passes the test. (Note _test_ singular - in order to check
that this patch doesn't cause a regression I would need to run the
whole test-suite - that takes 3.5 hours, if I'm not doing anything
else!)

> 
> (*) this is also already done inside match_stat_data()
> 
> Reported-by: Carlo Arenas <carenas@gmail.com>
> Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  So I'm going with match_stat_data(). But I don't know, perhaps just
>  ignoring inode (like Carlo's original patch) is safer/better?
> 
>  Tested on case-insensitive JFS on Linux. But I don't think it really
>  matters because I'm not even sure if I could push inode above 2^32
>  with this. Hacking JFS for this test sounds fun, but no time for that.
> 
>  entry.c          | 4 ++--
>  t/t5601-clone.sh | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..0a3c451f5f 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state,
>  {
>  	int i, trust_ino = check_stat;
>  
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)

I was a little curious about this (but couldn't be bothered actually
read the code, post-application), so I removed this hunk from the
patch, rebuilt and ran the test again: it _passed_ the test. :-D

So, ...

ATB,
Ramsay Jones

>  	trust_ino = 0;
>  #endif
>  
> @@ -419,7 +419,7 @@ static void mark_colliding_entries(const struct checkout *state,
>  		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>  			continue;
>  
> -		if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
> +		if ((trust_ino && !match_stat_data(&dup->ce_stat_data, st)) ||
>  		    (!trust_ino && !fspathcmp(ce->name, dup->name))) {
>  			dup->ce_flags |= CE_MATCHED;
>  			break;
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index f1a49e94f5..c28d51bd59 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
>  	)
>  '
>  
> -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
> +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
>  	grep X icasefs/warning &&
>  	grep x icasefs/warning &&
>  	test_i18ngrep "the following paths have collided" icasefs/warning
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] clone: fix colliding file detection on APFS
  2018-11-20 16:28                                       ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy
  2018-11-20 19:20                                         ` Ramsay Jones
@ 2018-11-20 19:35                                         ` Carlo Arenas
  2018-11-20 19:38                                           ` Duy Nguyen
  2018-11-22 17:59                                         ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi
  2 siblings, 1 reply; 96+ messages in thread
From: Carlo Arenas @ 2018-11-20 19:35 UTC (permalink / raw)
  To: pclouds
  Cc: gitster, git, git, newren, pawelparuzel95, peff, sandals,
	szeder.dev, tboegi, ramsay

Tested-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>

in macOS 10.14.1 with APFS
in Linux using VFAT (for the lulz)

IMHO it would be ideal if test would be enabled/validated for windows
(native, not only cygwin) as it might even work without the override
and if we are to see conflicts, that is probably where most users with
file insensitive filesystems might be found

Carlo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH] clone: fix colliding file detection on APFS
  2018-11-20 19:35                                         ` Carlo Arenas
@ 2018-11-20 19:38                                           ` Duy Nguyen
  0 siblings, 0 replies; 96+ messages in thread
From: Duy Nguyen @ 2018-11-20 19:38 UTC (permalink / raw)
  To: Carlo Arenas
  Cc: Junio C Hamano, Jeff Hostetler, Git Mailing List, Elijah Newren,
	Paweł Paruzel, Jeff King, brian m. carlson,
	SZEDER Gábor, Torsten Bögershausen, Ramsay Jones

On Tue, Nov 20, 2018 at 8:35 PM Carlo Arenas <carenas@gmail.com> wrote:
> IMHO it would be ideal if test would be enabled/validated for windows
> (native, not only cygwin) as it might even work without the override
> and if we are to see conflicts, that is probably where most users with
> file insensitive filesystems might be found

Yes but I can't test on Windows so I will not enable the test until I
got a report that it's working there.
-- 
Duy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW
  2018-11-20 16:28                                       ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy
  2018-11-20 19:20                                         ` Ramsay Jones
  2018-11-20 19:35                                         ` Carlo Arenas
@ 2018-11-22 17:59                                         ` tboegi
  2018-11-22 20:16                                           ` Carlo Marcelo Arenas Belón
  2 siblings, 1 reply; 96+ messages in thread
From: tboegi @ 2018-11-22 17:59 UTC (permalink / raw)
  To: git
  Cc: carenas, git, newren, pawelparuzel95, pclouds, peff, sandals,
	szeder.dev, tboegi, ramsay

From: Torsten Bögershausen <tboegi@web.de>

Commit b878579ae7 (clone: report duplicate entries on case-insensitive
filesystems - 2018-08-17) adds a warning to user when cloning a repo
with case-sensitive file names on a case-insensitive file system.

This test has never been enabled for MINGW.
It had been working since day 1, but I forget to report that to the
author.
Enable it after a re-test.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---

The other day, I wanted to test Duys patch -
under MINGW - to see if the problem is catch(ed)
but hehe git am failed to apply - not a big desaster,
because is is already in master
Here is a follow-up, end we can end the match


 t/t5601-clone.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index c28d51bd59..8bbc7068ac 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
 	)
 '
 
-test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
+test_expect_success CASE_INSENSITIVE_FS 'colliding file detection' '
 	grep X icasefs/warning &&
 	grep x icasefs/warning &&
 	test_i18ngrep "the following paths have collided" icasefs/warning
-- 
2.19.0.271.gfe8321ec05


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW
  2018-11-22 17:59                                         ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi
@ 2018-11-22 20:16                                           ` Carlo Marcelo Arenas Belón
  2018-11-23 11:24                                             ` Johannes Schindelin
  0 siblings, 1 reply; 96+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2018-11-22 20:16 UTC (permalink / raw)
  To: tboegi
  Cc: git, git, newren, pawelparuzel95, pclouds, peff, ramsay, sandals,
	szeder.dev

Which FS was this tested on?, is Git LFS I keep hearing about also considered
a "filesystem" for git?

Could you also test with the following applied on top?

Carlo
-- >8 --
Subject: [PATCH] entry: remove windows fallback to inode checking

this test is really FS specific, so is better to avoid any compiled
assumptions about the platform and let the user drive the fallback
through core.checkStat instead

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 entry.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/entry.c b/entry.c
index 0a3c451f5f..5ae74856e6 100644
--- a/entry.c
+++ b/entry.c
@@ -404,10 +404,6 @@ static void mark_colliding_entries(const struct checkout *state,
 {
 	int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
-	trust_ino = 0;
-#endif
-
 	ce->ce_flags |= CE_MATCHED;
 
 	for (i = 0; i < state->istate->cache_nr; i++) {
-- 
2.20.0.rc1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW
  2018-11-22 20:16                                           ` Carlo Marcelo Arenas Belón
@ 2018-11-23 11:24                                             ` Johannes Schindelin
  0 siblings, 0 replies; 96+ messages in thread
From: Johannes Schindelin @ 2018-11-23 11:24 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón
  Cc: tboegi, git, git, newren, pawelparuzel95, pclouds, peff, ramsay,
	sandals, szeder.dev

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

Hi Carlo,

On Thu, 22 Nov 2018, Carlo Marcelo Arenas Belón wrote:

> Subject: [PATCH] entry: remove windows fallback to inode checking
> 
> this test is really FS specific, so is better to avoid any compiled
> assumptions about the platform and let the user drive the fallback
> through core.checkStat instead
> 
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  entry.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/entry.c b/entry.c
> index 0a3c451f5f..5ae74856e6 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,10 +404,6 @@ static void mark_colliding_entries(const struct checkout *state,
>  {
>  	int i, trust_ino = check_stat;
>  
> -#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
> -	trust_ino = 0;
> -#endif
> -

No, we cannot drop this. You may not see it in git.git's source code, but
in Git for Windows' patches, we have an experimental feature (which had
seemed to stabilize, but Ben Peart is currently doing wonders with it,
improving the performance substantially) for accelerating the file
metadata enumeration in a noticeable manner. The only way we can do that
is by *not* insisting on a correct inode.

Besides, IIRC even our regular stat() now "fails" to fill the inode field.

So no, we cannot do that. We can probably drop the `||
defined(__CYGINW__)` part (Cygwin even generates a fake inode for FAT,
where no equivalent is available, by hashing the full normalized path).
But you cannot drop the `GIT_WINDOWS_NATIVE` part.

Ciao,
Johannes

>  	ce->ce_flags |= CE_MATCHED;
>  
>  	for (i = 0; i < state->istate->cache_nr; i++) {
> -- 
> 2.20.0.rc1
> 
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2018-11-23 11:24 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-27  9:59 Git clone and case sensitivity Paweł Paruzel
2018-07-27 20:59 ` brian m. carlson
2018-07-28  4:36   ` Duy Nguyen
2018-07-28  4:45     ` Duy Nguyen
2018-07-28  4:48       ` Jeff King
2018-07-28  5:11         ` Duy Nguyen
2018-07-28  9:48           ` Simon Ruderich
2018-07-28  9:56           ` Jeff King
2018-07-28 18:05             ` brian m. carlson
2018-07-29  5:26             ` Duy Nguyen
2018-07-29  9:28               ` Jeff King
2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-07-31 18:23                   ` Torsten Bögershausen
2018-08-01 15:25                     ` Duy Nguyen
2018-07-31 18:44                   ` Elijah Newren
2018-07-31 19:12                     ` Junio C Hamano
2018-07-31 19:29                       ` Jeff King
2018-07-31 20:12                         ` Junio C Hamano
2018-07-31 20:37                           ` Jeff King
2018-07-31 20:57                             ` Junio C Hamano
2018-08-01 21:20                               ` Junio C Hamano
2018-08-02 14:43                                 ` Duy Nguyen
2018-08-02 16:27                                   ` Junio C Hamano
2018-08-02 19:06                                     ` Jeff King
2018-08-02 21:14                                       ` Junio C Hamano
2018-08-02 21:28                                         ` Jeff King
2018-08-03 18:23                                           ` Jeff Hostetler
2018-08-03 18:49                                             ` Junio C Hamano
2018-08-03 18:53                                             ` Jeff King
2018-08-05 14:01                                               ` Jeff Hostetler
2018-08-03 14:28                                   ` Torsten Bögershausen
2018-08-01 15:21                     ` Duy Nguyen
2018-07-31 19:13                   ` Junio C Hamano
2018-08-01 15:16                     ` Duy Nguyen
2018-08-07 19:01                   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-08-07 19:31                     ` Junio C Hamano
2018-08-08 19:48                       ` Jeff Hostetler
2018-08-08 22:31                         ` Jeff King
2018-08-09  0:41                           ` Junio C Hamano
2018-08-09 14:23                             ` Jeff King
2018-08-09 21:14                               ` Jeff Hostetler
2018-08-09 21:34                                 ` Jeff King
2018-08-09 21:40                                 ` Elijah Newren
2018-08-09 21:44                                   ` Jeff King
2018-08-09 21:53                                     ` Elijah Newren
2018-08-09 21:59                                       ` Jeff King
2018-08-09 23:05                                         ` Elijah Newren
2018-08-09 22:07                                   ` Junio C Hamano
2018-08-10 15:36                     ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy
2018-08-10 15:36                       ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-08-10 16:42                         ` Junio C Hamano
2018-08-11 10:09                         ` SZEDER Gábor
2018-08-11 13:16                           ` Duy Nguyen
2018-08-13 16:55                             ` Junio C Hamano
2018-08-13 17:12                               ` Duy Nguyen
2018-08-10 16:12                       ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Junio C Hamano
2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-08-13 15:32                         ` Jeff Hostetler
2018-08-13 17:18                         ` Junio C Hamano
2018-08-15 19:08                         ` Torsten Bögershausen
2018-08-15 19:35                           ` Duy Nguyen
2018-08-16 15:56                             ` [PATCH] config.txt: clarify core.checkStat = minimal Nguyễn Thái Ngọc Duy
2018-08-16 17:01                               ` Junio C Hamano
2018-08-16 18:19                                 ` Duy Nguyen
2018-08-16 22:29                                   ` Junio C Hamano
2018-08-17 15:26                                   ` Junio C Hamano
2018-08-17 15:29                                     ` Duy Nguyen
2018-08-15 19:38                           ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano
2018-08-16 14:03                             ` Torsten Bögershausen
2018-08-16 15:42                               ` Duy Nguyen
2018-08-16 16:23                               ` Junio C Hamano
2018-08-17 16:16                         ` [PATCH v5] " Nguyễn Thái Ngọc Duy
2018-08-17 17:20                           ` Junio C Hamano
2018-08-17 18:00                             ` Duy Nguyen
2018-08-17 19:46                           ` Torsten Bögershausen
2018-11-19  8:20                           ` Carlo Marcelo Arenas Belón
2018-11-19 12:28                             ` Torsten Bögershausen
2018-11-19 17:14                               ` Carlo Arenas
2018-11-19 18:24                                 ` Duy Nguyen
2018-11-19 21:03                                   ` Duy Nguyen
2018-11-19 21:04                                     ` Duy Nguyen
2018-11-19 21:17                                     ` Duy Nguyen
2018-11-19 23:29                                     ` Ramsay Jones
2018-11-19 23:54                                       ` Ramsay Jones
2018-11-20  1:05                                         ` Carlo Arenas
2018-11-20  2:22                                     ` Junio C Hamano
2018-11-20 16:28                                       ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy
2018-11-20 19:20                                         ` Ramsay Jones
2018-11-20 19:35                                         ` Carlo Arenas
2018-11-20 19:38                                           ` Duy Nguyen
2018-11-22 17:59                                         ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi
2018-11-22 20:16                                           ` Carlo Marcelo Arenas Belón
2018-11-23 11:24                                             ` Johannes Schindelin
2018-11-19 17:21                               ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones
2018-11-19 19:39                                 ` Carlo Arenas
2018-07-31 19:39                 ` Git clone and case sensitivity Jeff Hostetler

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).