* Git clone and case sensitivity @ 2018-07-27 9:59 Paweł Paruzel 2018-07-27 20:59 ` brian m. carlson 0 siblings, 1 reply; 96+ messages in thread From: Paweł Paruzel @ 2018-07-27 9:59 UTC (permalink / raw) To: git Hi, Lately, I have been wondering why my test files in repo are modified after I clone it. It turned out to be two files: boolStyle_t_f and boolStyle_T_F. The system that pushed those files was case sensitive while my mac after High Sierra update had APFS which is by default case-insensitive. I highly suggest that git clone threw an exception when files are case sensitive and being cloned to a case insensitive system. This has caused problems with overriding files for test cases without any warning. Thanks in advance. Regards, Pawel Paruzel ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-27 9:59 Git clone and case sensitivity Paweł Paruzel @ 2018-07-27 20:59 ` brian m. carlson 2018-07-28 4:36 ` Duy Nguyen 0 siblings, 1 reply; 96+ messages in thread From: brian m. carlson @ 2018-07-27 20:59 UTC (permalink / raw) To: Paweł Paruzel; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1399 bytes --] On Fri, Jul 27, 2018 at 11:59:33AM +0200, Paweł Paruzel wrote: > Hi, > > Lately, I have been wondering why my test files in repo are modified > after I clone it. It turned out to be two files: boolStyle_t_f and > boolStyle_T_F. > The system that pushed those files was case sensitive while my mac > after High Sierra update had APFS which is by default > case-insensitive. I highly suggest that git clone threw an exception > when files are case sensitive and being cloned to a case insensitive > system. This has caused problems with overriding files for test cases > without any warning. If we did what you proposed, it would be impossible to clone such a repository on a case-insensitive system. While this might be fine for a closed system such as inside a company, this would make many open source repositories unusable, even when the files differing in case are nonfunctional (like README and readme). This is actually one of a few ways people can make repositories that will show as modified on Windows or macOS systems, due to limitations in those OSes. If you want to be sure that your repository is unmodified after clone, you can ensure that the output of git status --porcelain is empty, such as by checking for a zero exit from "test $(git status --porcelain | wc -l) -eq 0". -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 867 bytes --] ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-27 20:59 ` brian m. carlson @ 2018-07-28 4:36 ` Duy Nguyen 2018-07-28 4:45 ` Duy Nguyen 0 siblings, 1 reply; 96+ messages in thread From: Duy Nguyen @ 2018-07-28 4:36 UTC (permalink / raw) To: brian m. carlson; +Cc: Paweł Paruzel, git On Fri, Jul 27, 2018 at 08:59:09PM +0000, brian m. carlson wrote: > On Fri, Jul 27, 2018 at 11:59:33AM +0200, Paweł Paruzel wrote: > > Hi, > > > > Lately, I have been wondering why my test files in repo are modified > > after I clone it. It turned out to be two files: boolStyle_t_f and > > boolStyle_T_F. > > The system that pushed those files was case sensitive while my mac > > after High Sierra update had APFS which is by default > > case-insensitive. I highly suggest that git clone threw an exception > > when files are case sensitive and being cloned to a case insensitive > > system. This has caused problems with overriding files for test cases > > without any warning. > > If we did what you proposed, it would be impossible to clone such a > repository on a case-insensitive system. I agree throwing a real exception would be bad. But how about detecting the problem and trying our best to keep the repo in somewhat usable state like this? This patch uses sparse checkout to hide all those paths that we fail to checkout, so you can still have a clean worktree to do things, as long as you don't touch those paths. -- 8< -- diff --git a/builtin/clone.c b/builtin/clone.c index 1d939af9d8..a6b5e2c948 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -711,6 +711,30 @@ static void update_head(const struct ref *our, const struct ref *remote, } } +static int enable_sparse_checkout_on_icase_fs(struct index_state *istate) +{ + int i; + int skip_count = 0; + FILE *fp = fopen(git_path("info/sparse"), "a+"); + + for (i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + if (!ce_skip_worktree(ce)) + continue; + if (!skip_count) { + git_config_set_multivar_gently("core.sparseCheckout", + "true", + CONFIG_REGEX_NONE, 0); + fprintf(fp, "# List of paths hidden by 'git clone'\n"); + } + fprintf(fp, "/%s\n", ce->name); + skip_count++; + } + fclose(fp); + + return skip_count; +} + static int checkout(int submodule_progress) { struct object_id oid; @@ -751,6 +775,7 @@ static int checkout(int submodule_progress) opts.verbose_update = (option_verbosity >= 0); opts.src_index = &the_index; opts.dst_index = &the_index; + opts.clone_checkout = 1; tree = parse_tree_indirect(&oid); parse_tree(tree); @@ -761,6 +786,12 @@ static int checkout(int submodule_progress) if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) die(_("unable to write new index file")); + if (enable_sparse_checkout_on_icase_fs(&the_index)) + warning("Paths that differ only in case are detected " + "and will not work correctly on this case-insensitive " + "filesystem. Sparse checkout has been enabled to hide " + "these paths."); + err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1), oid_to_hex(&oid), "1", NULL); diff --git a/cache.h b/cache.h index 8b447652a7..9ecf7ad952 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, + set_skipworktree_on_updated:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..ba21db63e7 100644 --- a/entry.c +++ b/entry.c @@ -447,6 +447,11 @@ int checkout_entry(struct cache_entry *ce, if (!changed) return 0; + if (state->set_skipworktree_on_updated) { + ce->ce_flags |= CE_SKIP_WORKTREE; + state->istate->cache_changed |= CE_ENTRY_CHANGED; + return 0; + } if (!state->force) { if (!state->quiet) fprintf(stderr, diff --git a/unpack-trees.c b/unpack-trees.c index 66741130ae..a8a24e0b13 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -358,6 +358,7 @@ static int check_updates(struct unpack_trees_options *o) state.quiet = 1; state.refresh_cache = 1; state.istate = index; + state.set_skipworktree_on_updated = o->clone_checkout; progress = get_progress(o); diff --git a/unpack-trees.h b/unpack-trees.h index c2b434c606..8ebe2e2ec5 100644 --- a/unpack-trees.h +++ b/unpack-trees.h @@ -49,6 +49,7 @@ struct unpack_trees_options { aggressive, skip_unmerged, initial_checkout, + clone_checkout, diff_index_cached, debug_unpack, skip_sparse_checkout, -- 8< -- > While this might be fine for a closed system such as inside a > company, this would make many open source repositories unusable, > even when the files differing in case are nonfunctional (like README > and readme). > > This is actually one of a few ways people can make repositories that > will show as modified on Windows or macOS systems, due to limitations in > those OSes. If you want to be sure that your repository is unmodified > after clone, you can ensure that the output of git status --porcelain is > empty, such as by checking for a zero exit from > "test $(git status --porcelain | wc -l) -eq 0". > -- > brian m. carlson: Houston, Texas, US > OpenPGP: https://keybase.io/bk2204 ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-28 4:36 ` Duy Nguyen @ 2018-07-28 4:45 ` Duy Nguyen 2018-07-28 4:48 ` Jeff King 0 siblings, 1 reply; 96+ messages in thread From: Duy Nguyen @ 2018-07-28 4:45 UTC (permalink / raw) To: brian m. carlson; +Cc: Paweł Paruzel, Git Mailing List On Sat, Jul 28, 2018 at 6:36 AM Duy Nguyen <pclouds@gmail.com> wrote: > > On Fri, Jul 27, 2018 at 08:59:09PM +0000, brian m. carlson wrote: > > On Fri, Jul 27, 2018 at 11:59:33AM +0200, Paweł Paruzel wrote: > > > Hi, > > > > > > Lately, I have been wondering why my test files in repo are modified > > > after I clone it. It turned out to be two files: boolStyle_t_f and > > > boolStyle_T_F. > > > The system that pushed those files was case sensitive while my mac > > > after High Sierra update had APFS which is by default > > > case-insensitive. I highly suggest that git clone threw an exception > > > when files are case sensitive and being cloned to a case insensitive > > > system. This has caused problems with overriding files for test cases > > > without any warning. > > > > If we did what you proposed, it would be impossible to clone such a > > repository on a case-insensitive system. > > I agree throwing a real exception would be bad. But how about detecting > the problem and trying our best to keep the repo in somewhat usable > state like this? > > This patch uses sparse checkout to hide all those paths that we fail > to checkout, so you can still have a clean worktree to do things, as > long as you don't touch those paths. Side note. There may still be problems with this patch. Let's use vim-colorschemes.git as an example, which has darkBlue.vim and darkblue.vim. Say we have checked out darkBlue.vim and hidden darkblue.vim. When you update darkBlue.vim on worktree and then update the index, are we sure we will update darkBlue.vim entry and not (hidden) darkblue.vim? I am not sure. I don't think our lookup function is prepared to deal with this. Maybe it's best to hide both of them. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-28 4:45 ` Duy Nguyen @ 2018-07-28 4:48 ` Jeff King 2018-07-28 5:11 ` Duy Nguyen 0 siblings, 1 reply; 96+ messages in thread From: Jeff King @ 2018-07-28 4:48 UTC (permalink / raw) To: Duy Nguyen; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List On Sat, Jul 28, 2018 at 06:45:43AM +0200, Duy Nguyen wrote: > > I agree throwing a real exception would be bad. But how about detecting > > the problem and trying our best to keep the repo in somewhat usable > > state like this? > > > > This patch uses sparse checkout to hide all those paths that we fail > > to checkout, so you can still have a clean worktree to do things, as > > long as you don't touch those paths. > > Side note. There may still be problems with this patch. Let's use > vim-colorschemes.git as an example, which has darkBlue.vim and > darkblue.vim. > > Say we have checked out darkBlue.vim and hidden darkblue.vim. When you > update darkBlue.vim on worktree and then update the index, are we sure > we will update darkBlue.vim entry and not (hidden) darkblue.vim? I am > not sure. I don't think our lookup function is prepared to deal with > this. Maybe it's best to hide both of them. It might be enough to just issue a warning and give an advise() hint that tells the user what's going on. Then they can decide what to do (hide both paths, or just work in the index, or move to a different fs, or complain to upstream). -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-28 4:48 ` Jeff King @ 2018-07-28 5:11 ` Duy Nguyen 2018-07-28 9:48 ` Simon Ruderich 2018-07-28 9:56 ` Jeff King 0 siblings, 2 replies; 96+ messages in thread From: Duy Nguyen @ 2018-07-28 5:11 UTC (permalink / raw) To: Jeff King; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List On Sat, Jul 28, 2018 at 12:48:57AM -0400, Jeff King wrote: > On Sat, Jul 28, 2018 at 06:45:43AM +0200, Duy Nguyen wrote: > > > > I agree throwing a real exception would be bad. But how about detecting > > > the problem and trying our best to keep the repo in somewhat usable > > > state like this? > > > > > > This patch uses sparse checkout to hide all those paths that we fail > > > to checkout, so you can still have a clean worktree to do things, as > > > long as you don't touch those paths. > > > > Side note. There may still be problems with this patch. Let's use > > vim-colorschemes.git as an example, which has darkBlue.vim and > > darkblue.vim. > > > > Say we have checked out darkBlue.vim and hidden darkblue.vim. When you > > update darkBlue.vim on worktree and then update the index, are we sure > > we will update darkBlue.vim entry and not (hidden) darkblue.vim? I am > > not sure. I don't think our lookup function is prepared to deal with > > this. Maybe it's best to hide both of them. > > It might be enough to just issue a warning and give an advise() hint > that tells the user what's going on. Then they can decide what to do > (hide both paths, or just work in the index, or move to a different fs, > or complain to upstream). Yeah that may be the best option. Something like this perhaps? Not sure how much detail the advice should be here. -- 8< -- diff --git a/builtin/clone.c b/builtin/clone.c index 1d939af9d8..b47ad5877b 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -711,6 +711,30 @@ static void update_head(const struct ref *our, const struct ref *remote, } } +static int has_duplicate_icase_entries(struct index_state *istate) +{ + struct string_list list = STRING_LIST_INIT_NODUP; + int i; + int found = 0; + + for (i = 0; i < istate->cache_nr; i++) + string_list_append(&list, istate->cache[i]->name); + + list.cmp = strcasecmp; + string_list_sort(&list); + + for (i = 1; i < list.nr; i++) { + if (strcasecmp(list.items[i-1].string, + list.items[i].string)) + continue; + found = 1; + break; + } + string_list_clear(&list, 0); + + return found; +} + static int checkout(int submodule_progress) { struct object_id oid; @@ -761,6 +785,11 @@ static int checkout(int submodule_progress) if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) die(_("unable to write new index file")); + if (ignore_case && has_duplicate_icase_entries(&the_index)) + warning(_("This repository has paths that only differ in case\n" + "and you have a case-insenitive filesystem which will\n" + "cause problems.")); + err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1), oid_to_hex(&oid), "1", NULL); -- 8< -- ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-28 5:11 ` Duy Nguyen @ 2018-07-28 9:48 ` Simon Ruderich 2018-07-28 9:56 ` Jeff King 1 sibling, 0 replies; 96+ messages in thread From: Simon Ruderich @ 2018-07-28 9:48 UTC (permalink / raw) To: Duy Nguyen Cc: Jeff King, brian m. carlson, Paweł Paruzel, Git Mailing List On Sat, Jul 28, 2018 at 07:11:05AM +0200, Duy Nguyen wrote: > static int checkout(int submodule_progress) > { > struct object_id oid; > @@ -761,6 +785,11 @@ static int checkout(int submodule_progress) > if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) > die(_("unable to write new index file")); > > + if (ignore_case && has_duplicate_icase_entries(&the_index)) > + warning(_("This repository has paths that only differ in case\n" > + "and you have a case-insenitive filesystem which will\n" > + "cause problems.")); > + > err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1), > oid_to_hex(&oid), "1", NULL); I think the advice message should list the problematic file names. Even though this might be quite verbose it will help those affected to quickly find the problematic files to either fix this on their own or report to upstream (unless there's already an easy way to find those files - if so it should be mentioned in the message). Regards Simon -- + privacy is necessary + using gnupg http://gnupg.org + public key id: 0x92FEFDB7E44C32F9 ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-28 5:11 ` Duy Nguyen 2018-07-28 9:48 ` Simon Ruderich @ 2018-07-28 9:56 ` Jeff King 2018-07-28 18:05 ` brian m. carlson 2018-07-29 5:26 ` Duy Nguyen 1 sibling, 2 replies; 96+ messages in thread From: Jeff King @ 2018-07-28 9:56 UTC (permalink / raw) To: Duy Nguyen; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List On Sat, Jul 28, 2018 at 07:11:05AM +0200, Duy Nguyen wrote: > > It might be enough to just issue a warning and give an advise() hint > > that tells the user what's going on. Then they can decide what to do > > (hide both paths, or just work in the index, or move to a different fs, > > or complain to upstream). > > Yeah that may be the best option. Something like this perhaps? Not > sure how much detail the advice should be here. Yeah, something along these lines. I agree with Simon's comment elsewhere that this should probably mention the names. I don't know if we'd want to offer advice pointing them to using the sparse feature to work around it. > +static int has_duplicate_icase_entries(struct index_state *istate) > +{ > + struct string_list list = STRING_LIST_INIT_NODUP; > + int i; > + int found = 0; > + > + for (i = 0; i < istate->cache_nr; i++) > + string_list_append(&list, istate->cache[i]->name); > + > + list.cmp = strcasecmp; > + string_list_sort(&list); > + > + for (i = 1; i < list.nr; i++) { > + if (strcasecmp(list.items[i-1].string, > + list.items[i].string)) > + continue; > + found = 1; > + break; > + } > + string_list_clear(&list, 0); > + > + return found; > +} strcasecmp() will only catch a subset of the cases. We really need to follow the same folding rules that the filesystem would. For the case of clone, I actually wonder if we could detect during the checkout step that a file already exists. Since we know that the directory we started with was empty, then if it does, either: - there's some funny case-folding going on that means two paths in the repository map to the same name in the filesystem; or - somebody else is writing to the directory at the same time as us Either of which I think would be worth warning about. I'm not sure if we already lstat() the paths we're writing anyway as part of the checkout, so we might even get the feature "for free". -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-28 9:56 ` Jeff King @ 2018-07-28 18:05 ` brian m. carlson 2018-07-29 5:26 ` Duy Nguyen 1 sibling, 0 replies; 96+ messages in thread From: brian m. carlson @ 2018-07-28 18:05 UTC (permalink / raw) To: Jeff King; +Cc: Duy Nguyen, Paweł Paruzel, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1113 bytes --] On Sat, Jul 28, 2018 at 05:56:59AM -0400, Jeff King wrote: > strcasecmp() will only catch a subset of the cases. We really need to > follow the same folding rules that the filesystem would. > > For the case of clone, I actually wonder if we could detect during the > checkout step that a file already exists. Since we know that the > directory we started with was empty, then if it does, either: > > - there's some funny case-folding going on that means two paths in the > repository map to the same name in the filesystem; or > > - somebody else is writing to the directory at the same time as us > > Either of which I think would be worth warning about. I'm not sure if we > already lstat() the paths we're writing anyway as part of the checkout, > so we might even get the feature "for free". This is possible to do. From the bug I accidentally introduced in 2.16, we know that on clone, there is a code path that is only traversed when we hit this case and only on case-insensitive file systems. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 867 bytes --] ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-28 9:56 ` Jeff King 2018-07-28 18:05 ` brian m. carlson @ 2018-07-29 5:26 ` Duy Nguyen 2018-07-29 9:28 ` Jeff King 1 sibling, 1 reply; 96+ messages in thread From: Duy Nguyen @ 2018-07-29 5:26 UTC (permalink / raw) To: Jeff King; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List On Sat, Jul 28, 2018 at 11:57 AM Jeff King <peff@peff.net> wrote: > > +static int has_duplicate_icase_entries(struct index_state *istate) > > +{ > > + struct string_list list = STRING_LIST_INIT_NODUP; > > + int i; > > + int found = 0; > > + > > + for (i = 0; i < istate->cache_nr; i++) > > + string_list_append(&list, istate->cache[i]->name); > > + > > + list.cmp = strcasecmp; > > + string_list_sort(&list); > > + > > + for (i = 1; i < list.nr; i++) { > > + if (strcasecmp(list.items[i-1].string, > > + list.items[i].string)) > > + continue; > > + found = 1; > > + break; > > + } > > + string_list_clear(&list, 0); > > + > > + return found; > > +} > > strcasecmp() will only catch a subset of the cases. We really need to > follow the same folding rules that the filesystem would. True. But that's how we handle case insensitivity internally. If a filesytem has more sophisticated folding rules then git will not work well on that one anyway. > For the case of clone, I actually wonder if we could detect during the > checkout step that a file already exists. Since we know that the > directory we started with was empty, then if it does, either: > > - there's some funny case-folding going on that means two paths in the > repository map to the same name in the filesystem; or > > - somebody else is writing to the directory at the same time as us This is exactly what my first patch does (minus the sparse checkout part). But without knowing the exact folding rules, I don't think we can locate this "somebody else" who wrote the first path. So if N paths are treated the same by this filesystem, we could only report N-1 of them. If we want to report just one path when this happens though, then this works quite well. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-29 5:26 ` Duy Nguyen @ 2018-07-29 9:28 ` Jeff King 2018-07-30 15:27 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-07-31 19:39 ` Git clone and case sensitivity Jeff Hostetler 0 siblings, 2 replies; 96+ messages in thread From: Jeff King @ 2018-07-29 9:28 UTC (permalink / raw) To: Duy Nguyen; +Cc: brian m. carlson, Paweł Paruzel, Git Mailing List On Sun, Jul 29, 2018 at 07:26:41AM +0200, Duy Nguyen wrote: > > strcasecmp() will only catch a subset of the cases. We really need to > > follow the same folding rules that the filesystem would. > > True. But that's how we handle case insensitivity internally. If a > filesytem has more sophisticated folding rules then git will not work > well on that one anyway. Hrm. Yeah, I guess that's the best we can do for the actual in-memory checks. Everything else depends on doing an actual filesystem operation, and our icase stuff kicks in way before then. I was mostly thinking of HFS+ utf8 normalization weirdness, but I guess people are accustomed to that by now. > > For the case of clone, I actually wonder if we could detect during the > > checkout step that a file already exists. Since we know that the > > directory we started with was empty, then if it does, either: > > > > - there's some funny case-folding going on that means two paths in the > > repository map to the same name in the filesystem; or > > > > - somebody else is writing to the directory at the same time as us > > This is exactly what my first patch does (minus the sparse checkout > part). Right, sorry, I should have read that one more carefully. > But without knowing the exact folding rules, I don't think we can > locate this "somebody else" who wrote the first path. So if N paths > are treated the same by this filesystem, we could only report N-1 of > them. > > If we want to report just one path when this happens though, then this > works quite well. Hmm. Since most such systems are case-preserving, would it be possible to report the name of the existing file? Doing it via opendir/readdir is hacky, and anyway puts the burden on us to find the matching name. Doing it via fstat() on the opened file doesn't work because at that the filesystem has resolved the name to an inode. So yeah, perhaps strcasecmp() is the best we can do (I do agree that being able to mention all of the conflicting names is a benefit). I guess we should be using fspathcmp(), though, in case it later learns to be smarter. -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-29 9:28 ` Jeff King @ 2018-07-30 15:27 ` Nguyễn Thái Ngọc Duy 2018-07-31 18:23 ` Torsten Bögershausen ` (3 more replies) 2018-07-31 19:39 ` Git clone and case sensitivity Jeff Hostetler 1 sibling, 4 replies; 96+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2018-07-30 15:27 UTC (permalink / raw) To: peff; +Cc: git, pawelparuzel95, pclouds, sandals Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what's exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. I have not suggested any way to work around or fix this problem. But I guess we could probably have a section in Documentation/ dedicated to this problem and point there instead of a long advice in this warning. Another thing we probably should do is catch in "git checkout" too, not just "git clone" since your linux/unix colleage colleague may accidentally add some files that your mac/windows machine is not very happy with. But then there's another problem, once the problem is known, we probably should stop spamming this warning at every checkout, but how? Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- builtin/clone.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..32738c2737 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -711,6 +711,33 @@ static void update_head(const struct ref *our, const struct ref *remote, } } +static void find_duplicate_icase_entries(struct index_state *istate, + struct string_list *dup) +{ + struct string_list list = STRING_LIST_INIT_NODUP; + int i; + + for (i = 0; i < istate->cache_nr; i++) + string_list_append(&list, istate->cache[i]->name); + + list.cmp = fspathcmp; + string_list_sort(&list); + + for (i = 1; i < list.nr; i++) { + const char *cur = list.items[i].string; + const char *prev = list.items[i - 1].string; + + if (dup->nr && + !fspathcmp(cur, dup->items[dup->nr - 1].string)) { + string_list_append(dup, cur); + } else if (!fspathcmp(cur, prev)) { + string_list_append(dup, prev); + string_list_append(dup, cur); + } + } + string_list_clear(&list, 0); +} + static int checkout(int submodule_progress) { struct object_id oid; @@ -761,6 +788,20 @@ static int checkout(int submodule_progress) if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) die(_("unable to write new index file")); + if (ignore_case) { + struct string_list dup = STRING_LIST_INIT_DUP; + int i; + + find_duplicate_icase_entries(&the_index, &dup); + if (dup.nr) { + warning(_("the following paths in this repository only differ in case and will\n" + "cause problems because you have cloned it on an case-insensitive filesytem:\n")); + for (i = 0; i < dup.nr; i++) + fprintf(stderr, "\t%s\n", dup.items[i].string); + } + string_list_clear(&dup, 0); + } + err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1), oid_to_hex(&oid), "1", NULL); -- 2.18.0.656.gda699b98b3 ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-30 15:27 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy @ 2018-07-31 18:23 ` Torsten Bögershausen 2018-08-01 15:25 ` Duy Nguyen 2018-07-31 18:44 ` Elijah Newren ` (2 subsequent siblings) 3 siblings, 1 reply; 96+ messages in thread From: Torsten Bögershausen @ 2018-07-31 18:23 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: peff, git, pawelparuzel95, sandals On Mon, Jul 30, 2018 at 05:27:55PM +0200, Nguyễn Thái Ngọc Duy wrote: > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what's exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. I have not suggested any way to work around or fix this > problem. But I guess we could probably have a section in > Documentation/ dedicated to this problem and point there instead of > a long advice in this warning. > > Another thing we probably should do is catch in "git checkout" too, > not just "git clone" since your linux/unix colleage colleague may > accidentally add some files that your mac/windows machine is not very > happy with. But then there's another problem, once the problem is > known, we probably should stop spamming this warning at every > checkout, but how? > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > builtin/clone.c | 41 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 41 insertions(+) > > diff --git a/builtin/clone.c b/builtin/clone.c > index 5c439f1394..32738c2737 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -711,6 +711,33 @@ static void update_head(const struct ref *our, const struct ref *remote, > } > } > > +static void find_duplicate_icase_entries(struct index_state *istate, > + struct string_list *dup) > +{ > + struct string_list list = STRING_LIST_INIT_NODUP; > + int i; > + > + for (i = 0; i < istate->cache_nr; i++) > + string_list_append(&list, istate->cache[i]->name); > + > + list.cmp = fspathcmp; > + string_list_sort(&list); > + > + for (i = 1; i < list.nr; i++) { > + const char *cur = list.items[i].string; > + const char *prev = list.items[i - 1].string; > + > + if (dup->nr && > + !fspathcmp(cur, dup->items[dup->nr - 1].string)) { > + string_list_append(dup, cur); > + } else if (!fspathcmp(cur, prev)) { > + string_list_append(dup, prev); > + string_list_append(dup, cur); > + } > + } > + string_list_clear(&list, 0); > +} > + > static int checkout(int submodule_progress) > { > struct object_id oid; > @@ -761,6 +788,20 @@ static int checkout(int submodule_progress) > if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) > die(_("unable to write new index file")); > > + if (ignore_case) { > + struct string_list dup = STRING_LIST_INIT_DUP; > + int i; > + > + find_duplicate_icase_entries(&the_index, &dup); > + if (dup.nr) { > + warning(_("the following paths in this repository only differ in case and will\n" > + "cause problems because you have cloned it on an case-insensitive filesytem:\n")); Thanks for the patch. I wonder if we can tell the users more about the "problems" and how to avoid them, or to live with them. This is more loud thinking: "The following paths only differ in case\n" "One a case-insensitive file system only one at a time can be present\n" "You may rename one like this:\n" "git checkout <file> && git mv <file> <file>.1\n" > + fprintf(stderr, "\t%s\n", dup.items[i].string); Another question: Do we need any quote_path() here ? (This may be overkill, since typically the repos with conflicting names only use ASCII.) > + } > + string_list_clear(&dup, 0); > + } > + > err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1), > oid_to_hex(&oid), "1", NULL); > > -- > 2.18.0.656.gda699b98b3 > ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 18:23 ` Torsten Bögershausen @ 2018-08-01 15:25 ` Duy Nguyen 0 siblings, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-01 15:25 UTC (permalink / raw) To: Torsten Bögershausen Cc: Jeff King, Git Mailing List, Paweł Paruzel, brian m. carlson On Tue, Jul 31, 2018 at 8:23 PM Torsten Bögershausen <tboegi@web.de> wrote: > I wonder if we can tell the users more about the "problems" > and how to avoid them, or to live with them. > > This is more loud thinking: > > "The following paths only differ in case\n" > "One a case-insensitive file system only one at a time can be present\n" > "You may rename one like this:\n" > "git checkout <file> && git mv <file> <file>.1\n" Jeff gave a couple more options [1] to fix or workaround this. I think the problem is there is no single recommended way to deal with it. If there is, we can describe in this warning. But listing multiple options in this warning may be too much (the wall of text could easily take half a screen). Or if people agree on _one_ suggestion, I will gladly put it in. [1] https://public-inbox.org/git/CACsJy8A_uZM7nUmyERNHJMya0EyRQYTV7Dp2ikLznxnbOQU6tw@mail.gmail.com/T/#m60fedd7dc928a4d52eb5919811f84556f391a7b3 > > + fprintf(stderr, "\t%s\n", dup.items[i].string); > > Another question: > Do we need any quote_path() here ? > (This may be overkill, since typically the repos with conflicting names > only use ASCII.) Would be good to show trailing spaces in path names, so yes. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-30 15:27 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-07-31 18:23 ` Torsten Bögershausen @ 2018-07-31 18:44 ` Elijah Newren 2018-07-31 19:12 ` Junio C Hamano 2018-08-01 15:21 ` Duy Nguyen 2018-07-31 19:13 ` Junio C Hamano 2018-08-07 19:01 ` [PATCH v2] " Nguyễn Thái Ngọc Duy 3 siblings, 2 replies; 96+ messages in thread From: Elijah Newren @ 2018-07-31 18:44 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: Jeff King, Git Mailing List, pawelparuzel95, brian m. carlson On Mon, Jul 30, 2018 at 8:27 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote: > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what's exactly is "dirty". "what" rather than "what's"? > This patch helps the situation a bit by pointing out the problem at > clone time. I have not suggested any way to work around or fix this > problem. But I guess we could probably have a section in > Documentation/ dedicated to this problem and point there instead of > a long advice in this warning. > > Another thing we probably should do is catch in "git checkout" too, > not just "git clone" since your linux/unix colleage colleague may drop "colleage", keep "colleague"? > accidentally add some files that your mac/windows machine is not very > happy with. But then there's another problem, once the problem is > known, we probably should stop spamming this warning at every > checkout, but how? Good questions. I have no answers. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > builtin/clone.c | 41 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 41 insertions(+) > > diff --git a/builtin/clone.c b/builtin/clone.c > index 5c439f1394..32738c2737 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -711,6 +711,33 @@ static void update_head(const struct ref *our, const struct ref *remote, > } > } > > +static void find_duplicate_icase_entries(struct index_state *istate, > + struct string_list *dup) > +{ > + struct string_list list = STRING_LIST_INIT_NODUP; > + int i; > + > + for (i = 0; i < istate->cache_nr; i++) > + string_list_append(&list, istate->cache[i]->name); > + > + list.cmp = fspathcmp; > + string_list_sort(&list); So, you sort the list by fspathcmp to get the entries that differ in case only next to each other. Makes sense... > + > + for (i = 1; i < list.nr; i++) { > + const char *cur = list.items[i].string; > + const char *prev = list.items[i - 1].string; > + > + if (dup->nr && > + !fspathcmp(cur, dup->items[dup->nr - 1].string)) { > + string_list_append(dup, cur); If we have at least one duplicate in dup (and currently we'd have to have at least two), and we now hit yet another (i.e. a third or fourth...) way of spelling the same path, then we add it. > + } else if (!fspathcmp(cur, prev)) { > + string_list_append(dup, prev); > + string_list_append(dup, cur); > + } ...otherwise, if we find a duplicate, we add both spellings to the dup list. > + } > + string_list_clear(&list, 0); > +} > + > static int checkout(int submodule_progress) > { > struct object_id oid; > @@ -761,6 +788,20 @@ static int checkout(int submodule_progress) > if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) > die(_("unable to write new index file")); > > + if (ignore_case) { > + struct string_list dup = STRING_LIST_INIT_DUP; > + int i; > + > + find_duplicate_icase_entries(&the_index, &dup); > + if (dup.nr) { > + warning(_("the following paths in this repository only differ in case and will\n" Perhaps I'm being excessively pedantic, but what if there are multiple pairs of paths differing in case? E.g. if someone has readme.txt, README.txt, foo.txt, and FOO.txt, you'll list all four files but readme.txt and foo.txt do not only differ in case. Maybe something like "...only differ in case from another path and will... " or is that too verbose and annoying? > + "cause problems because you have cloned it on an case-insensitive filesytem:\n")); > + for (i = 0; i < dup.nr; i++) > + fprintf(stderr, "\t%s\n", dup.items[i].string); > + } > + string_list_clear(&dup, 0); > + } > + > err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1), > oid_to_hex(&oid), "1", NULL); Is it worth attempting to also warn about paths that only differ in UTF-normalization on relevant MacOS systems? ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 18:44 ` Elijah Newren @ 2018-07-31 19:12 ` Junio C Hamano 2018-07-31 19:29 ` Jeff King 2018-08-01 15:21 ` Duy Nguyen 1 sibling, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-07-31 19:12 UTC (permalink / raw) To: Elijah Newren Cc: Nguyễn Thái Ngọc Duy, Jeff King, Git Mailing List, pawelparuzel95, brian m. carlson Elijah Newren <newren@gmail.com> writes: > Is it worth attempting to also warn about paths that only differ in > UTF-normalization on relevant MacOS systems? I hate to bring up a totally different approach this late in the party, but I wonder if it makes more sense to take advantage of "clone" being a command that starts from an empty working tree. builtin/clone.c::checkout() drives a single-tree unpack_trees(), using oneway_merge() as its callback and at the end, eventually unpack_trees.c:check_updates() will call into checkout_entry() to perform the usual "unlink and then create" dance. I wonder if it makes sense to introduce a new option to tell the machinery to report when the final checkout_entry() notices that it needed to remove the working tree file to make room (perhaps that bit would go in "struct unpack_trees_options"). In the initial checkout codepath for a freshly cloned repository, that would only happen when your tree has two (or more) paths that gets smashed by case insensitive or UTF-normalizing filesystem, and the code we would maintain do not have to care how exactly the filesystem collapses two (or more) paths if we go that way. We only need to report "we tried to check out X but it seems your filesystem equates something else that is also in the project to X". ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 19:12 ` Junio C Hamano @ 2018-07-31 19:29 ` Jeff King 2018-07-31 20:12 ` Junio C Hamano 0 siblings, 1 reply; 96+ messages in thread From: Jeff King @ 2018-07-31 19:29 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren, Nguyễn Thái Ngọc Duy, Git Mailing List, pawelparuzel95, brian m. carlson On Tue, Jul 31, 2018 at 12:12:58PM -0700, Junio C Hamano wrote: > Elijah Newren <newren@gmail.com> writes: > > > Is it worth attempting to also warn about paths that only differ in > > UTF-normalization on relevant MacOS systems? > > I hate to bring up a totally different approach this late in the > party, but I wonder if it makes more sense to take advantage of > "clone" being a command that starts from an empty working tree. > > builtin/clone.c::checkout() drives a single-tree unpack_trees(), > using oneway_merge() as its callback and at the end, eventually > unpack_trees.c:check_updates() will call into checkout_entry() > to perform the usual "unlink and then create" dance. > > I wonder if it makes sense to introduce a new option to tell the > machinery to report when the final checkout_entry() notices that it > needed to remove the working tree file to make room (perhaps that > bit would go in "struct unpack_trees_options"). In the initial > checkout codepath for a freshly cloned repository, that would only > happen when your tree has two (or more) paths that gets smashed > by case insensitive or UTF-normalizing filesystem, and the code we > would maintain do not have to care how exactly the filesystem > collapses two (or more) paths if we go that way. We only need to > report "we tried to check out X but it seems your filesystem equates > something else that is also in the project to X". Heh. See my similar suggestion in: https://public-inbox.org/git/20180728095659.GA21450@sigill.intra.peff.net/ and the response from Duy. -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 19:29 ` Jeff King @ 2018-07-31 20:12 ` Junio C Hamano 2018-07-31 20:37 ` Jeff King 0 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-07-31 20:12 UTC (permalink / raw) To: Jeff King Cc: Elijah Newren, Nguyễn Thái Ngọc Duy, Git Mailing List, pawelparuzel95, brian m. carlson Jeff King <peff@peff.net> writes: > On Tue, Jul 31, 2018 at 12:12:58PM -0700, Junio C Hamano wrote: > ... >> collapses two (or more) paths if we go that way. We only need to >> report "we tried to check out X but it seems your filesystem equates >> something else that is also in the project to X". > > Heh. See my similar suggestion in: > > https://public-inbox.org/git/20180728095659.GA21450@sigill.intra.peff.net/ > > and the response from Duy. Yes, but is there a reason why we need to report what that "something else" is? Presumably we are already in an error codepath, so if it is absolutely necessary, then we can issue a lstat() to grab the inum for the path we are about to create, iterate over the previously checked out paths issuing lstat() and see which one yields the same inum, to find the one who is the culprit. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 20:12 ` Junio C Hamano @ 2018-07-31 20:37 ` Jeff King 2018-07-31 20:57 ` Junio C Hamano 0 siblings, 1 reply; 96+ messages in thread From: Jeff King @ 2018-07-31 20:37 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren, Nguyễn Thái Ngọc Duy, Git Mailing List, pawelparuzel95, brian m. carlson On Tue, Jul 31, 2018 at 01:12:14PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Tue, Jul 31, 2018 at 12:12:58PM -0700, Junio C Hamano wrote: > > ... > >> collapses two (or more) paths if we go that way. We only need to > >> report "we tried to check out X but it seems your filesystem equates > >> something else that is also in the project to X". > > > > Heh. See my similar suggestion in: > > > > https://public-inbox.org/git/20180728095659.GA21450@sigill.intra.peff.net/ > > > > and the response from Duy. > > Yes, but is there a reason why we need to report what that > "something else" is? I don't think it's strictly necessary, but it probably makes things easier for the user. That said... > Presumably we are already in an error codepath, so if it is > absolutely necessary, then we can issue a lstat() to grab the inum > for the path we are about to create, iterate over the previously > checked out paths issuing lstat() and see which one yields the same > inum, to find the one who is the culprit. Yes, this is the cleverness I was missing in my earlier response. So it seems do-able, and I like that this incurs no cost in the non-error case. -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 20:37 ` Jeff King @ 2018-07-31 20:57 ` Junio C Hamano 2018-08-01 21:20 ` Junio C Hamano 0 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-07-31 20:57 UTC (permalink / raw) To: Jeff King Cc: Elijah Newren, Nguyễn Thái Ngọc Duy, Git Mailing List, pawelparuzel95, brian m. carlson Jeff King <peff@peff.net> writes: >> Presumably we are already in an error codepath, so if it is >> absolutely necessary, then we can issue a lstat() to grab the inum >> for the path we are about to create, iterate over the previously >> checked out paths issuing lstat() and see which one yields the same >> inum, to find the one who is the culprit. > > Yes, this is the cleverness I was missing in my earlier response. > > So it seems do-able, and I like that this incurs no cost in the > non-error case. Not so fast, unfortunately. I suspect that some filesystems do not give us inum that we can use for that "identity" purpose, and they tend to be the ones with the case smashing characteristics where we need this code in the error path the most X-<. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 20:57 ` Junio C Hamano @ 2018-08-01 21:20 ` Junio C Hamano 2018-08-02 14:43 ` Duy Nguyen 0 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-01 21:20 UTC (permalink / raw) To: Jeff King Cc: Elijah Newren, Nguyễn Thái Ngọc Duy, Git Mailing List, pawelparuzel95, brian m. carlson Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > >>> Presumably we are already in an error codepath, so if it is >>> absolutely necessary, then we can issue a lstat() to grab the inum >>> for the path we are about to create, iterate over the previously >>> checked out paths issuing lstat() and see which one yields the same >>> inum, to find the one who is the culprit. >> >> Yes, this is the cleverness I was missing in my earlier response. >> >> So it seems do-able, and I like that this incurs no cost in the >> non-error case. > > Not so fast, unfortunately. > > I suspect that some filesystems do not give us inum that we can use > for that "identity" purpose, and they tend to be the ones with the > case smashing characteristics where we need this code in the error > path the most X-<. But even if inum is unreliable, we should be able to use other clues, perhaps the same set of fields we use for cached stat matching optimization we use for "diff" plumbing commands, to implement the error report. The more important part of the idea is that we already need to notice that we need to remove a path that is in the working tree while doing the checkout, so the alternative approach won't incur any extra cost for normal cases where the project being checked out does not have two files whose pathnames are only different in case (or checking out such an offending project to a case sensitive filesytem, of course). So I guess it still _is_ workable. Any takers? ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-01 21:20 ` Junio C Hamano @ 2018-08-02 14:43 ` Duy Nguyen 2018-08-02 16:27 ` Junio C Hamano 2018-08-03 14:28 ` Torsten Bögershausen 0 siblings, 2 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-02 14:43 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson On Wed, Aug 1, 2018 at 11:20 PM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > Jeff King <peff@peff.net> writes: > > > >>> Presumably we are already in an error codepath, so if it is > >>> absolutely necessary, then we can issue a lstat() to grab the inum > >>> for the path we are about to create, iterate over the previously > >>> checked out paths issuing lstat() and see which one yields the same > >>> inum, to find the one who is the culprit. > >> > >> Yes, this is the cleverness I was missing in my earlier response. > >> > >> So it seems do-able, and I like that this incurs no cost in the > >> non-error case. > > > > Not so fast, unfortunately. > > > > I suspect that some filesystems do not give us inum that we can use > > for that "identity" purpose, and they tend to be the ones with the > > case smashing characteristics where we need this code in the error > > path the most X-<. > > But even if inum is unreliable, we should be able to use other > clues, perhaps the same set of fields we use for cached stat > matching optimization we use for "diff" plumbing commands, to > implement the error report. The more important part of the idea is > that we already need to notice that we need to remove a path that is > in the working tree while doing the checkout, so the alternative > approach won't incur any extra cost for normal cases where the > project being checked out does not have two files whose pathnames > are only different in case (or checking out such an offending > project to a case sensitive filesytem, of course). > > So I guess it still _is_ workable. Any takers? OK so we're going back to the original way of checking that we check out the different files on the same place (because fs is icase) and try to collect all paths for reporting, yes? I can give it another go (but of course if anybody else steps up, I'd very gladly hand this over) -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-02 14:43 ` Duy Nguyen @ 2018-08-02 16:27 ` Junio C Hamano 2018-08-02 19:06 ` Jeff King 2018-08-03 14:28 ` Torsten Bögershausen 1 sibling, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-02 16:27 UTC (permalink / raw) To: Duy Nguyen Cc: Jeff King, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson Duy Nguyen <pclouds@gmail.com> writes: >> But even if inum is unreliable, we should be able to use other >> clues, perhaps the same set of fields we use for cached stat >> matching optimization we use for "diff" plumbing commands, to >> implement the error report. The more important part of the idea is >> that we already need to notice that we need to remove a path that is >> in the working tree while doing the checkout, so the alternative >> approach won't incur any extra cost for normal cases where the >> project being checked out does not have two files whose pathnames >> are only different in case (or checking out such an offending >> project to a case sensitive filesytem, of course). >> >> So I guess it still _is_ workable. Any takers? > > OK so we're going back to the original way of checking that we check > out the different files on the same place (because fs is icase) and > try to collect all paths for reporting, yes? I can give it another go > (but of course if anybody else steps up, I'd very gladly hand this > over) Detect and report, definitely yes; I am not sure about collect all (personally I am OK if we stopped at reporting "I tried to check out X but your project tree has something else that is turned to X by your pathname-smashing filesystem" without making it a requirement to report what the other one that conflict with X is. Of course, reporting the other side _is_ nicer and I'd be happier if we can do so without too much ugly code, but I do not think it is a hard requirement. Thanks. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-02 16:27 ` Junio C Hamano @ 2018-08-02 19:06 ` Jeff King 2018-08-02 21:14 ` Junio C Hamano 0 siblings, 1 reply; 96+ messages in thread From: Jeff King @ 2018-08-02 19:06 UTC (permalink / raw) To: Junio C Hamano Cc: Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson On Thu, Aug 02, 2018 at 09:27:31AM -0700, Junio C Hamano wrote: > > OK so we're going back to the original way of checking that we check > > out the different files on the same place (because fs is icase) and > > try to collect all paths for reporting, yes? I can give it another go > > (but of course if anybody else steps up, I'd very gladly hand this > > over) > > Detect and report, definitely yes; I am not sure about collect all > (personally I am OK if we stopped at reporting "I tried to check out > X but your project tree has something else that is turned to X by > your pathname-smashing filesystem" without making it a requirement > to report what the other one that conflict with X is. Of course, > reporting the other side _is_ nicer and I'd be happier if we can do > so without too much ugly code, but I do not think it is a hard > requirement. Yeah, I think it would be OK to issue the warning for the conflicted path, and then if we _can_ produce the secondary list of colliding paths, do so. Even on a system with working inodes, we may not come up with a match (e.g., if it wasn't us who wrote the file, but rather we raced with some other process). I also wonder if Windows could return some other file-unique identifier that would work in place of an inode here. That would be pretty easy to swap in via an #ifdef's helper function. I'd be OK shipping without that and letting Windows folks fill it in later (as long as we do not do anything too stupid until then, like claim all of the inode==0 files are the same). -Peff PS It occurs to me that doing this naively (re-scan the entries already checked out when we see a collision) ends up quadratic over the number of entries in the worst case. That may not matter. You'd only have a handful of collisions normally, and anybody malicious can already git-bomb your checkout anyway. If we care, an alternative would be to set a flag for "I saw some collisions", and then follow up with a single pass putting entries into a hashmap of inode->filename. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-02 19:06 ` Jeff King @ 2018-08-02 21:14 ` Junio C Hamano 2018-08-02 21:28 ` Jeff King 0 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-02 21:14 UTC (permalink / raw) To: Jeff King Cc: Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson Jeff King <peff@peff.net> writes: > I also wonder if Windows could return some other file-unique identifier > that would work in place of an inode here. That would be pretty easy to > swap in via an #ifdef's helper function. I'd be OK shipping without that > and letting Windows folks fill it in later (as long as we do not do > anything too stupid until then, like claim all of the inode==0 files are > the same). Yeah, but such a useful file-unique identifier would probably be used in place of inum in their (l)stat emulation already, if exists, no? > PS It occurs to me that doing this naively (re-scan the entries already > checked out when we see a collision) ends up quadratic over the > number of entries in the worst case. That may not matter. You'd only > have a handful of collisions normally, and anybody malicious can > already git-bomb your checkout anyway. If we care, an alternative > would be to set a flag for "I saw some collisions", and then follow > up with a single pass putting entries into a hashmap of > inode->filename. Yeah, that makes sense. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-02 21:14 ` Junio C Hamano @ 2018-08-02 21:28 ` Jeff King 2018-08-03 18:23 ` Jeff Hostetler 0 siblings, 1 reply; 96+ messages in thread From: Jeff King @ 2018-08-02 21:28 UTC (permalink / raw) To: Junio C Hamano Cc: Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson On Thu, Aug 02, 2018 at 02:14:30PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > I also wonder if Windows could return some other file-unique identifier > > that would work in place of an inode here. That would be pretty easy to > > swap in via an #ifdef's helper function. I'd be OK shipping without that > > and letting Windows folks fill it in later (as long as we do not do > > anything too stupid until then, like claim all of the inode==0 files are > > the same). > > Yeah, but such a useful file-unique identifier would probably be > used in place of inum in their (l)stat emulation already, if exists, > no? Maybe. It might not work as ino_t. Or it might be expensive to get. Or maybe it's simply impossible. I don't know much about Windows. Some searching implies that NTFS does have a "file index" concept which is supposed to be unique. At any rate, until we have an actual plan for Windows, I think it would make sense only to split the cases into "has working inodes" and "other", and make sure "other" does something sensible in the meantime (like mention the conflict, but skip trying to list duplicates). When somebody wants to work on Windows support, then we can figure out if it just needs to wrap the "get unique identifier" operation, or if it would use a totally different algorithm. -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-02 21:28 ` Jeff King @ 2018-08-03 18:23 ` Jeff Hostetler 2018-08-03 18:49 ` Junio C Hamano 2018-08-03 18:53 ` Jeff King 0 siblings, 2 replies; 96+ messages in thread From: Jeff Hostetler @ 2018-08-03 18:23 UTC (permalink / raw) To: Jeff King, Junio C Hamano Cc: Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson On 8/2/2018 5:28 PM, Jeff King wrote: > On Thu, Aug 02, 2018 at 02:14:30PM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >>> I also wonder if Windows could return some other file-unique identifier >>> that would work in place of an inode here. That would be pretty easy to >>> swap in via an #ifdef's helper function. I'd be OK shipping without that >>> and letting Windows folks fill it in later (as long as we do not do >>> anything too stupid until then, like claim all of the inode==0 files are >>> the same). >> >> Yeah, but such a useful file-unique identifier would probably be >> used in place of inum in their (l)stat emulation already, if exists, >> no? > > Maybe. It might not work as ino_t. Or it might be expensive to get. Or > maybe it's simply impossible. I don't know much about Windows. Some > searching implies that NTFS does have a "file index" concept which is > supposed to be unique. This is hard and/or expensive on Windows. Yes, you can get the "file index" values for an open file handle with a cost similar to an fstat(). Unfortunately, the FindFirst/FindNext routines (equivalent to the opendir/readdir routines), don't give you that data. So we'd have to scan the directory and then open and stat each file. This is terribly expensive on Windows -- and the reason we have the fscache layer (in the GfW version) to intercept the lstat() calls whenever possible. It might be possible to use the NTFS Master File Table to discover this (very big handwave), but I would need to do a little digging. This would all be NTFS specific. FAT and other volume types would not be covered. Another thing to keep in mind is that the collision could be because of case folding (or other such nonsense) on a directory in the path. I mean, if someone on Linux builds a commit containing: a/b/c/D/e/foo.txt a/b/c/d/e/foo.txt we'll get a similar collision as if one of them were spelled "FOO.txt". Also, do we need to worry about hard-links or symlinks here? If checkout populates symlinks, then you might have another collision opportunity. For example: a/b/c/D/e/foo.txt a/link -> ./b/c/d a/link/e/foo.txt Also, some platforms (like the Mac) allow directory hard-links. Granted, Git doesn't create hard-links during checkout, but the user might. I'm sure there are other edge cases here that make reporting difficult; these are just a few I thought of. I guess what I'm trying to say is that as a first step just report that you found a collision -- without trying to identify the set existing objects that it collided with. > > At any rate, until we have an actual plan for Windows, I think it would > make sense only to split the cases into "has working inodes" and > "other", and make sure "other" does something sensible in the meantime > (like mention the conflict, but skip trying to list duplicates). Yes, this should be split. Do the "easy" Linux version first. Keep in mind that there may also be a different solution for the Mac. > When somebody wants to work on Windows support, then we can figure out > if it just needs to wrap the "get unique identifier" operation, or if it > would use a totally different algorithm. > > -Peff > Jeff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-03 18:23 ` Jeff Hostetler @ 2018-08-03 18:49 ` Junio C Hamano 2018-08-03 18:53 ` Jeff King 1 sibling, 0 replies; 96+ messages in thread From: Junio C Hamano @ 2018-08-03 18:49 UTC (permalink / raw) To: Jeff Hostetler Cc: Jeff King, Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson Jeff Hostetler <git@jeffhostetler.com> writes: > Another thing to keep in mind is that the collision could be because > of case folding (or other such nonsense) on a directory in the path. > I mean, if someone on Linux builds a commit containing: > > a/b/c/D/e/foo.txt > a/b/c/d/e/foo.txt > > we'll get a similar collision as if one of them were spelled "FOO.txt". I'd think the approach to teach checkout_entry() codepath to notice it needed to unlink the existing file in order to check out the entry it wanted to check out would cover this equally well. > Also, do we need to worry about hard-links or symlinks here? I do not think so. You do not get a file with multiple hardlinks in a "git clone" or "git checkout" result, and we do not check things out beyond a symbolic link in the first place. > If checkout populates symlinks, then you might have another collision > opportunity. For example: > > a/b/c/D/e/foo.txt > a/link -> ./b/c/d > a/link/e/foo.txt In other words, a tree with a/link (symlink) and a/link/<anything> that requires a/link to be a symlink and a directory at the same time cannot be created, so you won't get one with "git clone" > Also, some platforms (like the Mac) allow directory hard-links. > Granted, Git doesn't create hard-links during checkout, but the > user might. And we'd report "we are doing a fresh checkout immediately after a clone and saw some file we haven't created, which may indicate a case smashing filesystem glitch (or a competing third-party process creating random files)", so noticing that would be a good thing, I would think. > I'm sure there are other edge cases here that make reporting > difficult; these are just a few I thought of. I guess what I'm > trying to say is that as a first step just report that you found > a collision -- without trying to identify the set existing objects > that it collided with. Yup, I think that is sensible. If it can be done cheaply, i.e. on a filesystem with trustable and cheap inum, after noticing such a collision, go back and lstat() all paths in the index we have checked out so far to see which ones are colliding, it adds useful clue to the report, but noticing the collision in the first place obviously has more value. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-03 18:23 ` Jeff Hostetler 2018-08-03 18:49 ` Junio C Hamano @ 2018-08-03 18:53 ` Jeff King 2018-08-05 14:01 ` Jeff Hostetler 1 sibling, 1 reply; 96+ messages in thread From: Jeff King @ 2018-08-03 18:53 UTC (permalink / raw) To: Jeff Hostetler Cc: Junio C Hamano, Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson On Fri, Aug 03, 2018 at 02:23:17PM -0400, Jeff Hostetler wrote: > > Maybe. It might not work as ino_t. Or it might be expensive to get. Or > > maybe it's simply impossible. I don't know much about Windows. Some > > searching implies that NTFS does have a "file index" concept which is > > supposed to be unique. > > This is hard and/or expensive on Windows. Yes, you can get the > "file index" values for an open file handle with a cost similar to > an fstat(). Unfortunately, the FindFirst/FindNext routines (equivalent > to the opendir/readdir routines), don't give you that data. So we'd > have to scan the directory and then open and stat each file. This is > terribly expensive on Windows -- and the reason we have the fscache > layer (in the GfW version) to intercept the lstat() calls whenever > possible. I think that high cost might be OK for our purposes here. This code would _only_ kick in during a clone, and then only on the error path once we knew we had a collision during the checkout step. > Another thing to keep in mind is that the collision could be because > of case folding (or other such nonsense) on a directory in the path. > I mean, if someone on Linux builds a commit containing: > > a/b/c/D/e/foo.txt > a/b/c/d/e/foo.txt > > we'll get a similar collision as if one of them were spelled "FOO.txt". True, though I think that may be OK. If you had conflicting directories you'd get a _ton_ of duplicates listed, but that makes sense: you actually have a ton of duplicates. > Also, do we need to worry about hard-links or symlinks here? I think we can ignore hardlinks. Git never creates them, and we know the directory was empty when we started. Symlinks should be handled by using lstat(). (Obviously that's for a Unix-ish platform). > I'm sure there are other edge cases here that make reporting > difficult; these are just a few I thought of. I guess what I'm > trying to say is that as a first step just report that you found > a collision -- without trying to identify the set existing objects > that it collided with. I certainly don't disagree with that. :) > > At any rate, until we have an actual plan for Windows, I think it would > > make sense only to split the cases into "has working inodes" and > > "other", and make sure "other" does something sensible in the meantime > > (like mention the conflict, but skip trying to list duplicates). > > Yes, this should be split. Do the "easy" Linux version first. > Keep in mind that there may also be a different solution for the Mac. I assumed that an inode-based solution would work for Mac, since it's mostly BSD under the hood. There may be subtleties I don't know about, though. -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-03 18:53 ` Jeff King @ 2018-08-05 14:01 ` Jeff Hostetler 0 siblings, 0 replies; 96+ messages in thread From: Jeff Hostetler @ 2018-08-05 14:01 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Duy Nguyen, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson On 8/3/2018 2:53 PM, Jeff King wrote: > On Fri, Aug 03, 2018 at 02:23:17PM -0400, Jeff Hostetler wrote: > >>> Maybe. It might not work as ino_t. Or it might be expensive to get. Or >>> maybe it's simply impossible. I don't know much about Windows. Some >>> searching implies that NTFS does have a "file index" concept which is >>> supposed to be unique. >> >> This is hard and/or expensive on Windows. Yes, you can get the >> "file index" values for an open file handle with a cost similar to >> an fstat(). Unfortunately, the FindFirst/FindNext routines (equivalent >> to the opendir/readdir routines), don't give you that data. So we'd >> have to scan the directory and then open and stat each file. This is >> terribly expensive on Windows -- and the reason we have the fscache >> layer (in the GfW version) to intercept the lstat() calls whenever >> possible. > > I think that high cost might be OK for our purposes here. This code > would _only_ kick in during a clone, and then only on the error path > once we knew we had a collision during the checkout step. > Good point. I've confirmed that the "file index" values can be used to determine whether 2 path names are equivalent under NTFS for case variation, final-dot/space, and short-names vs long-names. I ran out of time this morning to search the directory for equivalent paths. I'll look at that shortly. Jeff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-08-02 14:43 ` Duy Nguyen 2018-08-02 16:27 ` Junio C Hamano @ 2018-08-03 14:28 ` Torsten Bögershausen 1 sibling, 0 replies; 96+ messages in thread From: Torsten Bögershausen @ 2018-08-03 14:28 UTC (permalink / raw) To: Duy Nguyen, Junio C Hamano Cc: Jeff King, Elijah Newren, Git Mailing List, Paweł Paruzel, brian m. carlson On 2018-08-02 15:43, Duy Nguyen wrote: > On Wed, Aug 1, 2018 at 11:20 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Junio C Hamano <gitster@pobox.com> writes: >> >>> Jeff King <peff@peff.net> writes: >>> >>>>> Presumably we are already in an error codepath, so if it is >>>>> absolutely necessary, then we can issue a lstat() to grab the inum >>>>> for the path we are about to create, iterate over the previously >>>>> checked out paths issuing lstat() and see which one yields the same >>>>> inum, to find the one who is the culprit. >>>> >>>> Yes, this is the cleverness I was missing in my earlier response. >>>> >>>> So it seems do-able, and I like that this incurs no cost in the >>>> non-error case. >>> >>> Not so fast, unfortunately. >>> >>> I suspect that some filesystems do not give us inum that we can use >>> for that "identity" purpose, and they tend to be the ones with the >>> case smashing characteristics where we need this code in the error >>> path the most X-<. >> >> But even if inum is unreliable, we should be able to use other >> clues, perhaps the same set of fields we use for cached stat >> matching optimization we use for "diff" plumbing commands, to >> implement the error report. The more important part of the idea is >> that we already need to notice that we need to remove a path that is >> in the working tree while doing the checkout, so the alternative >> approach won't incur any extra cost for normal cases where the >> project being checked out does not have two files whose pathnames >> are only different in case (or checking out such an offending >> project to a case sensitive filesytem, of course). >> >> So I guess it still _is_ workable. Any takers? > > OK so we're going back to the original way of checking that we check > out the different files on the same place (because fs is icase) and > try to collect all paths for reporting, yes? I would say: Yes. > I can give it another go > (but of course if anybody else steps up, I'd very gladly hand this > over) > Not at the moment. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 18:44 ` Elijah Newren 2018-07-31 19:12 ` Junio C Hamano @ 2018-08-01 15:21 ` Duy Nguyen 1 sibling, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-01 15:21 UTC (permalink / raw) To: Elijah Newren Cc: Jeff King, Git Mailing List, Paweł Paruzel, brian m. carlson On Tue, Jul 31, 2018 at 8:44 PM Elijah Newren <newren@gmail.com> wrote: > Is it worth attempting to also warn about paths that only differ in > UTF-normalization on relevant MacOS systems? Down this thread, Jeff Hostetler drew a scarier picture of "case" handling on MacOS and Windows. I think we should start with support things like utf normalization... in core.ignore first before doing the warning stuff. At that point we know well how to detect and warn, or any other pitfalls. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-30 15:27 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-07-31 18:23 ` Torsten Bögershausen 2018-07-31 18:44 ` Elijah Newren @ 2018-07-31 19:13 ` Junio C Hamano 2018-08-01 15:16 ` Duy Nguyen 2018-08-07 19:01 ` [PATCH v2] " Nguyễn Thái Ngọc Duy 3 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-07-31 19:13 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: peff, git, pawelparuzel95, sandals Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > Another thing we probably should do is catch in "git checkout" too, > not just "git clone" since your linux/unix colleage colleague may > accidentally add some files that your mac/windows machine is not very > happy with. Then you would catch it not in checkout but in add, no? ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems 2018-07-31 19:13 ` Junio C Hamano @ 2018-08-01 15:16 ` Duy Nguyen 0 siblings, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-01 15:16 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Git Mailing List, Paweł Paruzel, brian m. carlson On Tue, Jul 31, 2018 at 9:13 PM Junio C Hamano <gitster@pobox.com> wrote: > > Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > > > Another thing we probably should do is catch in "git checkout" too, > > not just "git clone" since your linux/unix colleage colleague may > > accidentally add some files that your mac/windows machine is not very > > happy with. > > Then you would catch it not in checkout but in add, no? No because the guy who added these may have done it on a case-sensitive filesystem. Only later when his friend fetches new changes on a case-insensitive filesytem, the problem becomes real. If in this scenario, core.ignore is enforced on all machines, then yes we could catch it at "git add" (and we should already do that or we have a bug). At least in open source project setting, I think enforcing core.ignore will not work. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-07-30 15:27 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy ` (2 preceding siblings ...) 2018-07-31 19:13 ` Junio C Hamano @ 2018-08-07 19:01 ` Nguyễn Thái Ngọc Duy 2018-08-07 19:31 ` Junio C Hamano 2018-08-10 15:36 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy 3 siblings, 2 replies; 96+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2018-08-07 19:01 UTC (permalink / raw) To: pclouds Cc: git, pawelparuzel95, peff, sandals, Elijah Newren, tboegi, Junio C Hamano, git Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. I do not make any suggestions to fix or workaround the problem because there are many different options, especially when the problem comes from folding rules other than case (e.g. UTF-8 normalization, Windows special paths...) In the previous iteration, inode has been suggested to find the matching entry. But it is platform specific, and because we already have a common function for matching stat, the function is used here even if it's more expensive. Bonus point is we don't need some "#ifdef platform" around this code. The cost goes higher when we find duplicated entries at the bottom of the index, but the number of these entries should be very small that total extra cost should not be really noticeable. This patch is tested with vim-colorschemes repository on a JFS partion with case insensitive support on Linux. This repository has two files darkBlue.vim and darkblue.vim. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- v2 has completely different approach so no point in sending interdiff. One nice thing about this is we don't need platform specific code for detecting the duplicate entries. I think ce_match_stat() works even on Windows. And it's now equally expensive on all platforms :D builtin/clone.c | 1 + cache.h | 2 ++ entry.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ unpack-trees.c | 23 +++++++++++++++++++++++ unpack-trees.h | 1 + 5 files changed, 71 insertions(+) diff --git a/builtin/clone.c b/builtin/clone.c index 9ebb5acf56..38d5609282 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -748,6 +748,7 @@ static int checkout(int submodule_progress) memset(&opts, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = &the_index; diff --git a/cache.h b/cache.h index 8dc7134f00..cdf0984707 100644 --- a/cache.h +++ b/cache.h @@ -1515,9 +1515,11 @@ struct checkout { const char *base_dir; int base_dir_len; struct delayed_checkout *delayed_checkout; + int *nr_duplicates; unsigned force:1, quiet:1, not_new:1, + clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..3917bfc874 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,47 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_duplicate_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i; + int *count = state->nr_duplicates; + + if (!count) + BUG("state->nr_duplicates must not be NULL"); + + ce->ce_flags |= CE_MATCHED; + (*count)++; + + if (!state->refresh_cache) + BUG("We need this to narrow down the set of updated entries"); + + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + /* + * This entry has not been checked out yet, otherwise + * its stat info must have been updated. And since we + * check out from top to bottom, the rest is guaranteed + * not checked out. Stop now. + */ + if (!ce_uptodate(dup)) + break; + + if (dup->ce_flags & CE_MATCHED) + continue; + + if (ce_match_stat(dup, st, + CE_MATCH_IGNORE_VALID | + CE_MATCH_IGNORE_SKIP_WORKTREE)) + continue; + + dup->ce_flags |= CE_MATCHED; + (*count)++; + break; + } +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +496,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_duplicate_entries(state, ce, &st); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/unpack-trees.c b/unpack-trees.c index f9efee0836..1b0c11142a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -344,12 +344,20 @@ static int check_updates(struct unpack_trees_options *o) struct index_state *index = &o->result; struct checkout state = CHECKOUT_INIT; int i; + int nr_duplicates = 0; state.force = 1; state.quiet = 1; state.refresh_cache = 1; state.istate = index; + if (o->clone) { + state.clone = 1; + state.nr_duplicates = &nr_duplicates; + for (i = 0; i < index->cache_nr; i++) + index->cache[i]->ce_flags &= ~CE_MATCHED; + } + progress = get_progress(o); if (o->update) @@ -414,6 +422,21 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(&state); if (o->update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); + + if (o->clone && state.nr_duplicates) { + warning(_("the following paths in this repository only differ in case\n" + "from another path and will cause problems because you have cloned\n" + "it on an case-insensitive filesytem:\n")); + for (i = 0; i < index->cache_nr; i++) { + struct cache_entry *ce = index->cache[i]; + + if (!(ce->ce_flags & CE_MATCHED)) + continue; + fprintf(stderr, " '%s'\n", ce->name); + ce->ce_flags &= ~CE_MATCHED; + } + } + return errs != 0; } diff --git a/unpack-trees.h b/unpack-trees.h index c2b434c606..d940f1c5c2 100644 --- a/unpack-trees.h +++ b/unpack-trees.h @@ -42,6 +42,7 @@ struct unpack_trees_options { unsigned int reset, merge, update, + clone, index_only, nontrivial_merge, trivial_merges_only, -- 2.18.0.915.gd571298aae ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-07 19:01 ` [PATCH v2] " Nguyễn Thái Ngọc Duy @ 2018-08-07 19:31 ` Junio C Hamano 2018-08-08 19:48 ` Jeff Hostetler 2018-08-10 15:36 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy 1 sibling, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-07 19:31 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, pawelparuzel95, peff, sandals, Elijah Newren, tboegi, git Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > One nice thing about this is we don't need platform specific code for > detecting the duplicate entries. I think ce_match_stat() works even > on Windows. And it's now equally expensive on all platforms :D ce_match_stat() may not be a very good measure to see if two paths refer to the same file, though. After a fresh checkout, I would not be surprised if two completely unrelated paths have the same size and have same mtime/ctime. In its original use case, i.e. "I have one specific path in mind and took a snapshot of its metadata earlier. Is it still the same, or has it been touched?", that may be sufficient to detect that the path has been _modified_, but without reliable inum, it may be a poor measure to say two paths refer to the same. > builtin/clone.c | 1 + > cache.h | 2 ++ > entry.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > unpack-trees.c | 23 +++++++++++++++++++++++ > unpack-trees.h | 1 + > 5 files changed, 71 insertions(+) Having said that, it is pleasing to see that this can be achieved with so little additional code. > +static void mark_duplicate_entries(const struct checkout *state, > + struct cache_entry *ce, struct stat *st) > +{ > + int i; > + int *count = state->nr_duplicates; > + > + if (!count) > + BUG("state->nr_duplicates must not be NULL"); > + > + ce->ce_flags |= CE_MATCHED; > + (*count)++; > + > + if (!state->refresh_cache) > + BUG("We need this to narrow down the set of updated entries"); > + > + for (i = 0; i < state->istate->cache_nr; i++) { > + struct cache_entry *dup = state->istate->cache[i]; > + > + /* > + * This entry has not been checked out yet, otherwise > + * its stat info must have been updated. And since we > + * check out from top to bottom, the rest is guaranteed > + * not checked out. Stop now. > + */ > + if (!ce_uptodate(dup)) > + break; > + > + if (dup->ce_flags & CE_MATCHED) > + continue; > + > + if (ce_match_stat(dup, st, > + CE_MATCH_IGNORE_VALID | > + CE_MATCH_IGNORE_SKIP_WORKTREE)) > + continue; > + > + dup->ce_flags |= CE_MATCHED; > + (*count)++; > + break; > + } > +} > + Hmph. If there is only one true collision, then all its aliases will be marked with CE_MATCHED bit every time the second and the subsequent alias is checked out (as the caller calls this function when it noticed that something already is at the path ce wants to go). But if there are two sets of colliding paths, because there is only one bit used, we do not group the paths into these two sets and report, e.g. "blue.txt, BLUE.txt and BLUE.TXT collide. red.txt and RED.txt also collide." I am not sure if computing that is too much work for too little gain, but because this is in an error codepath, it may be worth doing. I dunno. > + > + if (o->clone && state.nr_duplicates) { > + warning(_("the following paths in this repository only differ in case\n" > + "from another path and will cause problems because you have cloned\n" > + "it on an case-insensitive filesytem:\n")); With the new approach, we no longer preemptively detect that the project will be harmed by a case smashing filesystems before it happens. This instead reports that the project has already been harmed on _this_ system by such a filesystem after the fact. So from the end-user's point of view, "will cause problems" may be a message that came a bit too late. "have collided and only one from the same colliding group is in the working tree; others failed to be checked out" is probably closer to the truth. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-07 19:31 ` Junio C Hamano @ 2018-08-08 19:48 ` Jeff Hostetler 2018-08-08 22:31 ` Jeff King 0 siblings, 1 reply; 96+ messages in thread From: Jeff Hostetler @ 2018-08-08 19:48 UTC (permalink / raw) To: Junio C Hamano, Nguyễn Thái Ngọc Duy Cc: git, pawelparuzel95, peff, sandals, Elijah Newren, tboegi On 8/7/2018 3:31 PM, Junio C Hamano wrote: > Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > >> One nice thing about this is we don't need platform specific code for >> detecting the duplicate entries. I think ce_match_stat() works even >> on Windows. And it's now equally expensive on all platforms :D > > ce_match_stat() may not be a very good measure to see if two paths > refer to the same file, though. After a fresh checkout, I would not > be surprised if two completely unrelated paths have the same size > and have same mtime/ctime. In its original use case, i.e. "I have > one specific path in mind and took a snapshot of its metadata > earlier. Is it still the same, or has it been touched?", that may > be sufficient to detect that the path has been _modified_, but > without reliable inum, it may be a poor measure to say two paths > refer to the same. I agree with Junio on this one. The mtime values are sloppy at best. On FAT file systems, they have 2 second resolution. Even NTFS IIRC has only 100ns resolution, so we might get a lot of false matches using this technique, right? It might be better to build an equivalence-class hash-map for the colliding entries. Compute a "normalized" version of the pathname (such as convert to lowercase, strip final-dots/spaces, strip the digits following tilda of a shortname, and etc for the MAC's UTF-isms). Then when you rescan the index entries to find the matches, apply the equivalence operator on the pathname and do the hashmap lookup. When you find a match, you have a "potential" collider pair (I say potential only because of the ambiguity of shortnames). Then we can use inum/file-index/whatever to see if they actually collide. Jeff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-08 19:48 ` Jeff Hostetler @ 2018-08-08 22:31 ` Jeff King 2018-08-09 0:41 ` Junio C Hamano 0 siblings, 1 reply; 96+ messages in thread From: Jeff King @ 2018-08-08 22:31 UTC (permalink / raw) To: Jeff Hostetler Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, git, pawelparuzel95, sandals, Elijah Newren, tboegi On Wed, Aug 08, 2018 at 03:48:04PM -0400, Jeff Hostetler wrote: > > ce_match_stat() may not be a very good measure to see if two paths > > refer to the same file, though. After a fresh checkout, I would not > > be surprised if two completely unrelated paths have the same size > > and have same mtime/ctime. In its original use case, i.e. "I have > > one specific path in mind and took a snapshot of its metadata > > earlier. Is it still the same, or has it been touched?", that may > > be sufficient to detect that the path has been _modified_, but > > without reliable inum, it may be a poor measure to say two paths > > refer to the same. > > I agree with Junio on this one. The mtime values are sloppy at best. > On FAT file systems, they have 2 second resolution. Even NTFS IIRC > has only 100ns resolution, so we might get a lot of false matches > using this technique, right? Yeah, I think anything less than inode (or some system equivalent) is going to be too flaky. > It might be better to build an equivalence-class hash-map for the > colliding entries. Compute a "normalized" version of the pathname > (such as convert to lowercase, strip final-dots/spaces, strip the > digits following tilda of a shortname, and etc for the MAC's UTF-isms). > Then when you rescan the index entries to find the matches, apply the > equivalence operator on the pathname and do the hashmap lookup. > When you find a match, you have a "potential" collider pair (I say > potential only because of the ambiguity of shortnames). Then we > can use inum/file-index/whatever to see if they actually collide. I think we really want to avoid doing that normalization ourselves if we can. There are just too many filesystem-specific rules. If we have an equivalence-class hashmap and feed it inodes (or again, some system equivalent) as the keys, we should get buckets of collisions. I started to write a "something like this..." earlier, but got bogged down in boilerplate around the C hashmap. But here it is in perl. ;) -- >8 -- # pretend we have these paths in our index paths='foo FOO and some other paths' # create them; this will make a single path on a case-insensitive system for i in $paths; do echo $i >$i done # now find the duplicates perl -le ' for my $path (@ARGV) { # this would be an ntfs unique-id on Windows my $inode = (lstat($path))[1]; push @{$h{$inode}}, $path; } for my $group (grep { @$_ > 1 } values(%h)) { print "group:"; print " ", $_ for (@$group); } ' $paths -- >8 -- which should show the obvious pair (it does for me on vfat-on-linux, though where it gets those inodes from, I have no idea ;) ). -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-08 22:31 ` Jeff King @ 2018-08-09 0:41 ` Junio C Hamano 2018-08-09 14:23 ` Jeff King 0 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-09 0:41 UTC (permalink / raw) To: Jeff King Cc: Jeff Hostetler, Nguyễn Thái Ngọc Duy, git, pawelparuzel95, sandals, Elijah Newren, tboegi Jeff King <peff@peff.net> writes: > I think we really want to avoid doing that normalization ourselves if we > can. There are just too many filesystem-specific rules. Exactly; not having to learn these rules is the major (if not whole) point of the "let checkout notice the collision and then deal with it" approach. Let's not forget that. > If we have an equivalence-class hashmap and feed it inodes (or again, > some system equivalent) as the keys, we should get buckets of > collisions. I guess one way to get "some system equivalent" that can be used as the last resort, when there absolutely is no inum equivalent, is to rehash the working tree file that shouldn't be there when we detect a collision. If we found that there is something when we tried to write out "Foo.txt", if we open "Foo.txt" on the working tree and hash-object it, we should find the matching blob somewhere in the index _before_ "Foo.txt". On a case-insensitive filesytem, it may well be "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt" only differ in case. Of course, that's really the last resort, as it would be costly, but this is something that only need to happen on the "unusual" case in the error codepath, so... ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 0:41 ` Junio C Hamano @ 2018-08-09 14:23 ` Jeff King 2018-08-09 21:14 ` Jeff Hostetler 0 siblings, 1 reply; 96+ messages in thread From: Jeff King @ 2018-08-09 14:23 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff Hostetler, Nguyễn Thái Ngọc Duy, git, pawelparuzel95, sandals, Elijah Newren, tboegi On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote: > > If we have an equivalence-class hashmap and feed it inodes (or again, > > some system equivalent) as the keys, we should get buckets of > > collisions. > > I guess one way to get "some system equivalent" that can be used as > the last resort, when there absolutely is no inum equivalent, is to > rehash the working tree file that shouldn't be there when we detect > a collision. > > If we found that there is something when we tried to write out > "Foo.txt", if we open "Foo.txt" on the working tree and hash-object > it, we should find the matching blob somewhere in the index _before_ > "Foo.txt". On a case-insensitive filesytem, it may well be > "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt" > only differ in case. Clever. You might still run into false positives when there is duplicated content in the repository (especially, say, zero-length files). But the fact that you only do the hashing on known duplicates helps with that. One of the things I did like about the equivalence-class approach is that it can be done in a single linear pass in the worst case. Whereas anything that searches when we see a collision is quite likely to be quadratic. But as I said before, it may not be worth worrying too much about that for an error code path where we expect the number of collisions to be small. -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 14:23 ` Jeff King @ 2018-08-09 21:14 ` Jeff Hostetler 2018-08-09 21:34 ` Jeff King 2018-08-09 21:40 ` Elijah Newren 0 siblings, 2 replies; 96+ messages in thread From: Jeff Hostetler @ 2018-08-09 21:14 UTC (permalink / raw) To: Jeff King, Junio C Hamano Cc: Nguyễn Thái Ngọc Duy, git, pawelparuzel95, sandals, Elijah Newren, tboegi On 8/9/2018 10:23 AM, Jeff King wrote: > On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote: > >>> If we have an equivalence-class hashmap and feed it inodes (or again, >>> some system equivalent) as the keys, we should get buckets of >>> collisions. >> >> I guess one way to get "some system equivalent" that can be used as >> the last resort, when there absolutely is no inum equivalent, is to >> rehash the working tree file that shouldn't be there when we detect >> a collision. >> >> If we found that there is something when we tried to write out >> "Foo.txt", if we open "Foo.txt" on the working tree and hash-object >> it, we should find the matching blob somewhere in the index _before_ >> "Foo.txt". On a case-insensitive filesytem, it may well be >> "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt" >> only differ in case. > > Clever. You might still run into false positives when there is > duplicated content in the repository (especially, say, zero-length > files). But the fact that you only do the hashing on known duplicates > helps with that. > I worry that the false positives make this a non-starter. I mean, if clone creates files 'A' and 'B' (both equal) and then tries to create 'b', would the collision code reports that 'b' collided with 'A' because that was the first OID match? Ideally with this scheme we'd have to search the entire index prior to 'b' and then report that 'b' collided with either 'A' or 'B'. Neither message instills confidence. And there's no way to prefer answer 'B' over 'A' without using knowledge of the FS name mangling/aliasing rules -- unless we want to just assume ignore-case for this iteration. > One of the things I did like about the equivalence-class approach is > that it can be done in a single linear pass in the worst case. Whereas > anything that searches when we see a collision is quite likely to be > quadratic. But as I said before, it may not be worth worrying too much > about that for an error code path where we expect the number of > collisions to be small. > > -Peff > Sorry to be paranoid, but I have an index with 3.5M entries, the word "quadratic" rings all kinds of alarm bells for me. :-) Granted, we expect the number of collisions to be small, but searching back for each collision over the already-populated portion of the index could be expensive. Jeff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 21:14 ` Jeff Hostetler @ 2018-08-09 21:34 ` Jeff King 2018-08-09 21:40 ` Elijah Newren 1 sibling, 0 replies; 96+ messages in thread From: Jeff King @ 2018-08-09 21:34 UTC (permalink / raw) To: Jeff Hostetler Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, git, pawelparuzel95, sandals, Elijah Newren, tboegi On Thu, Aug 09, 2018 at 05:14:16PM -0400, Jeff Hostetler wrote: > > Clever. You might still run into false positives when there is > > duplicated content in the repository (especially, say, zero-length > > files). But the fact that you only do the hashing on known duplicates > > helps with that. > > > > I worry that the false positives make this a non-starter. I mean, if > clone creates files 'A' and 'B' (both equal) and then tries to create > 'b', would the collision code reports that 'b' collided with 'A' because > that was the first OID match? Ideally with this scheme we'd have to > search the entire index prior to 'b' and then report that 'b' collided > with either 'A' or 'B'. Neither message instills confidence. And > there's no way to prefer answer 'B' over 'A' without using knowledge > of the FS name mangling/aliasing rules -- unless we want to just assume > ignore-case for this iteration. Yeah. If we can get usable unique ids (inode or otherwise) of some form on each system, I think I prefer that. It's much easier to reason about. > Sorry to be paranoid, but I have an index with 3.5M entries, the word > "quadratic" rings all kinds of alarm bells for me. :-) > > Granted, we expect the number of collisions to be small, but searching > back for each collision over the already-populated portion of the index > could be expensive. Heh. Yeah, it's really O(n*m), and we expect a small "m". But of course the two are equal in the worst case. -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 21:14 ` Jeff Hostetler 2018-08-09 21:34 ` Jeff King @ 2018-08-09 21:40 ` Elijah Newren 2018-08-09 21:44 ` Jeff King 2018-08-09 22:07 ` Junio C Hamano 1 sibling, 2 replies; 96+ messages in thread From: Elijah Newren @ 2018-08-09 21:40 UTC (permalink / raw) To: Jeff Hostetler Cc: Jeff King, Junio C Hamano, Nguyễn Thái Ngọc, Git Mailing List, pawelparuzel95, brian m. carlson, Torsten Bögershausen On Thu, Aug 9, 2018 at 2:14 PM Jeff Hostetler <git@jeffhostetler.com> wrote: > On 8/9/2018 10:23 AM, Jeff King wrote: > > On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote: > >> If we found that there is something when we tried to write out > >> "Foo.txt", if we open "Foo.txt" on the working tree and hash-object > >> it, we should find the matching blob somewhere in the index _before_ > >> "Foo.txt". On a case-insensitive filesytem, it may well be > >> "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt" > >> only differ in case. > > > > Clever. You might still run into false positives when there is > > duplicated content in the repository (especially, say, zero-length > > files). But the fact that you only do the hashing on known duplicates > > helps with that. > > I worry that the false positives make this a non-starter. I mean, if > clone creates files 'A' and 'B' (both equal) and then tries to create > 'b', would the collision code reports that 'b' collided with 'A' because > that was the first OID match? Ideally with this scheme we'd have to > search the entire index prior to 'b' and then report that 'b' collided > with either 'A' or 'B'. Neither message instills confidence. And > there's no way to prefer answer 'B' over 'A' without using knowledge > of the FS name mangling/aliasing rules -- unless we want to just assume > ignore-case for this iteration. A possibly crazy idea: Don't bother reporting the other filename; just report the OID instead. "Error: Foo.txt cannot be checked out because another file with hash <whatever> is in the way." Maybe even add a hint for the user: "Run `git ls-files -s` to see see all files and their hash". Whatever the exact wording for the error message, just create a nice post on stackoverflow.com explaining the various weird filesystems out there (VFAT, NTFS, HFS, APFS, etc) and how they cause differing filenames to be written to the same location. Have a bunch of folks vote it up so it has some nice search-engine juice. The error message isn't quite as good, but does the user really need all the names of the file? If so, we gave them enough information to figure it out, and this is a really unusual case anyway, right? Besides, now we're back to linear performance.... ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 21:40 ` Elijah Newren @ 2018-08-09 21:44 ` Jeff King 2018-08-09 21:53 ` Elijah Newren 2018-08-09 22:07 ` Junio C Hamano 1 sibling, 1 reply; 96+ messages in thread From: Jeff King @ 2018-08-09 21:44 UTC (permalink / raw) To: Elijah Newren Cc: Jeff Hostetler, Junio C Hamano, Nguyễn Thái Ngọc, Git Mailing List, pawelparuzel95, brian m. carlson, Torsten Bögershausen On Thu, Aug 09, 2018 at 02:40:58PM -0700, Elijah Newren wrote: > > I worry that the false positives make this a non-starter. I mean, if > > clone creates files 'A' and 'B' (both equal) and then tries to create > > 'b', would the collision code reports that 'b' collided with 'A' because > > that was the first OID match? Ideally with this scheme we'd have to > > search the entire index prior to 'b' and then report that 'b' collided > > with either 'A' or 'B'. Neither message instills confidence. And > > there's no way to prefer answer 'B' over 'A' without using knowledge > > of the FS name mangling/aliasing rules -- unless we want to just assume > > ignore-case for this iteration. > > A possibly crazy idea: Don't bother reporting the other filename; just > report the OID instead. > > "Error: Foo.txt cannot be checked out because another file with hash > <whatever> is in the way." Maybe even add a hint for the user: "Run > `git ls-files -s` to see see all files and their hash". > > Whatever the exact wording for the error message, just create a nice > post on stackoverflow.com explaining the various weird filesystems out > there (VFAT, NTFS, HFS, APFS, etc) and how they cause differing > filenames to be written to the same location. Have a bunch of folks > vote it up so it has some nice search-engine juice. Actually, I kind of like the simplicity of that. It puts the human brain in the loop. > The error message isn't quite as good, but does the user really need > all the names of the file? If so, we gave them enough information to > figure it out, and this is a really unusual case anyway, right? > Besides, now we're back to linear performance.... Well, it's still quadratic when they run O(n) iterations of "git ls-files -s | grep $colliding_oid". You've just pushed the second linear search onto the user. ;) -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 21:44 ` Jeff King @ 2018-08-09 21:53 ` Elijah Newren 2018-08-09 21:59 ` Jeff King 0 siblings, 1 reply; 96+ messages in thread From: Elijah Newren @ 2018-08-09 21:53 UTC (permalink / raw) To: Jeff King Cc: Jeff Hostetler, Junio C Hamano, Nguyễn Thái Ngọc, Git Mailing List, pawelparuzel95, brian m. carlson, Torsten Bögershausen On Thu, Aug 9, 2018 at 2:44 PM Jeff King <peff@peff.net> wrote: > > The error message isn't quite as good, but does the user really need > > all the names of the file? If so, we gave them enough information to > > figure it out, and this is a really unusual case anyway, right? > > Besides, now we're back to linear performance.... > > Well, it's still quadratic when they run O(n) iterations of "git > ls-files -s | grep $colliding_oid". You've just pushed the second linear > search onto the user. ;) Wouldn't that be their own fault for not running git ls-files -s | grep -e $colliding_oid_1 ... -e $colliding_oid_n | sort -k 2 ? ;-) ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 21:53 ` Elijah Newren @ 2018-08-09 21:59 ` Jeff King 2018-08-09 23:05 ` Elijah Newren 0 siblings, 1 reply; 96+ messages in thread From: Jeff King @ 2018-08-09 21:59 UTC (permalink / raw) To: Elijah Newren Cc: Jeff Hostetler, Junio C Hamano, Nguyễn Thái Ngọc, Git Mailing List, pawelparuzel95, brian m. carlson, Torsten Bögershausen On Thu, Aug 09, 2018 at 02:53:42PM -0700, Elijah Newren wrote: > On Thu, Aug 9, 2018 at 2:44 PM Jeff King <peff@peff.net> wrote: > > > The error message isn't quite as good, but does the user really need > > > all the names of the file? If so, we gave them enough information to > > > figure it out, and this is a really unusual case anyway, right? > > > Besides, now we're back to linear performance.... > > > > Well, it's still quadratic when they run O(n) iterations of "git > > ls-files -s | grep $colliding_oid". You've just pushed the second linear > > search onto the user. ;) > > Wouldn't that be their own fault for not running > git ls-files -s | grep -e $colliding_oid_1 ... -e $colliding_oid_n | sort -k 2 > ? ;-) Man, this thread is the gift that keeps on giving. :) That's still quadratic, isn't it? You've just hidden the second dimension in the single grep call. Now since these are all going to be constant strings, in theory an intelligent grep could stick them all in a search trie, and match each line with complexity k, the length of the matched strings. And since k=40, that's technically still linear overall. -Peff ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 21:59 ` Jeff King @ 2018-08-09 23:05 ` Elijah Newren 0 siblings, 0 replies; 96+ messages in thread From: Elijah Newren @ 2018-08-09 23:05 UTC (permalink / raw) To: Jeff King Cc: Jeff Hostetler, Junio C Hamano, Nguyễn Thái Ngọc, Git Mailing List, pawelparuzel95, brian m. carlson, Torsten Bögershausen On Thu, Aug 9, 2018 at 2:59 PM Jeff King <peff@peff.net> wrote: > On Thu, Aug 09, 2018 at 02:53:42PM -0700, Elijah Newren wrote: > > > On Thu, Aug 9, 2018 at 2:44 PM Jeff King <peff@peff.net> wrote: > > > > The error message isn't quite as good, but does the user really need > > > > all the names of the file? If so, we gave them enough information to > > > > figure it out, and this is a really unusual case anyway, right? > > > > Besides, now we're back to linear performance.... > > > > > > Well, it's still quadratic when they run O(n) iterations of "git > > > ls-files -s | grep $colliding_oid". You've just pushed the second linear > > > search onto the user. ;) > > > > Wouldn't that be their own fault for not running > > git ls-files -s | grep -e $colliding_oid_1 ... -e $colliding_oid_n | sort -k 2 > > ? ;-) > > Man, this thread is the gift that keeps on giving. :) > > That's still quadratic, isn't it? You've just hidden the second > dimension in the single grep call. It may depend on the implementation within grep. If I had remembered to pass -F, and if wikipedia's claims[1] about the Aho-Corasick algotihrm being the basis of the original Unix command fgrep, and it's claims that this algorithm is "linear in the length of the strings plus the length of the searched text plus the number of output matches", and I didn't just misunderstand something there, then it looks like it could be linear. [1] https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm Of course, the grep implementation could also do something stupid and provide dramatically worse behavior than running the command N times specifying one pattern each time.[2] [2] http://savannah.gnu.org/bugs/?16305 > Now since these are all going to be constant strings, in theory an > intelligent grep could stick them all in a search trie, and match each > line with complexity k, the length of the matched strings. And since > k=40, that's technically still linear overall. Looks like Aho-Corasick uses "a finite-state machine that resembles a trie", so definitely along those lines. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems 2018-08-09 21:40 ` Elijah Newren 2018-08-09 21:44 ` Jeff King @ 2018-08-09 22:07 ` Junio C Hamano 1 sibling, 0 replies; 96+ messages in thread From: Junio C Hamano @ 2018-08-09 22:07 UTC (permalink / raw) To: Elijah Newren Cc: Jeff Hostetler, Jeff King, Nguyễn Thái Ngọc, Git Mailing List, pawelparuzel95, brian m. carlson, Torsten Bögershausen Elijah Newren <newren@gmail.com> writes: > A possibly crazy idea: Don't bother reporting the other filename; just > report the OID instead. > > "Error: Foo.txt cannot be checked out because another file with hash > <whatever> is in the way." Maybe even add a hint for the user: "Run > `git ls-files -s` to see see all files and their hash". Once we start using OID to talk to humans, we are already lost. At that point, it would be 1000% better to - not check out Foo.txt to unlink and overwrite; instead leave the content that is already in the working tree as-is; and - report that Foo.txt was not checked out as something else that also claims to deserve that path was checked out already. Then the user can inspect Foo.txt and realize it is actually the contents that should be in foo.txt or whatever. ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v3 0/1] clone: warn on colidding entries on checkout 2018-08-07 19:01 ` [PATCH v2] " Nguyễn Thái Ngọc Duy 2018-08-07 19:31 ` Junio C Hamano @ 2018-08-10 15:36 ` Nguyễn Thái Ngọc Duy 2018-08-10 15:36 ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy ` (2 more replies) 1 sibling, 3 replies; 96+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2018-08-10 15:36 UTC (permalink / raw) To: pclouds; +Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, tboegi There are lots of suggestions on optimizing this stuff, but since this problem does not affect me to begin with, I'm reluctant to make more changes and going to stay simple, stupid and slow. I could continue to do small updates if needed. But for bigger changes, consider this patch dropped by me. v3 now uses inode on UNIXy platforms for checking colliding items. I still don't try to separate colliding groups because it should be quite obvious once you look at the colliding list (and most of the time I suspect we only have one or two groups). Since on Windows we can't really have colliding groups (no inode to check) so the warning message is to me a bit misleading. But I frankly don't want to put more effort in this. Nguyễn Thái Ngọc Duy (1): clone: report duplicate entries on case-insensitive filesystems builtin/clone.c | 1 + cache.h | 2 ++ entry.c | 32 ++++++++++++++++++++++++++++++++ unpack-trees.c | 22 ++++++++++++++++++++++ unpack-trees.h | 1 + 5 files changed, 58 insertions(+) -- 2.18.0.915.gd571298aae ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems 2018-08-10 15:36 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy @ 2018-08-10 15:36 ` Nguyễn Thái Ngọc Duy 2018-08-10 16:42 ` Junio C Hamano 2018-08-11 10:09 ` SZEDER Gábor 2018-08-10 16:12 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Junio C Hamano 2018-08-12 9:07 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2 siblings, 2 replies; 96+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2018-08-10 15:36 UTC (permalink / raw) To: pclouds; +Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, tboegi Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. This patch is tested with vim-colorschemes repository on a JFS partition with case insensitive support on Linux. This repository has two files darkBlue.vim and darkblue.vim. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- builtin/clone.c | 1 + cache.h | 2 ++ entry.c | 32 ++++++++++++++++++++++++++++++++ unpack-trees.c | 22 ++++++++++++++++++++++ unpack-trees.h | 1 + 5 files changed, 58 insertions(+) diff --git a/builtin/clone.c b/builtin/clone.c index 9ebb5acf56..38d5609282 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -748,6 +748,7 @@ static int checkout(int submodule_progress) memset(&opts, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = &the_index; diff --git a/cache.h b/cache.h index 8dc7134f00..cdf0984707 100644 --- a/cache.h +++ b/cache.h @@ -1515,9 +1515,11 @@ struct checkout { const char *base_dir; int base_dir_len; struct delayed_checkout *delayed_checkout; + int *nr_duplicates; unsigned force:1, quiet:1, not_new:1, + clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..f2d73e6255 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,35 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_duplicate_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i; + int *count = state->nr_duplicates; + + if (!count) + BUG("state->nr_duplicates must not be NULL"); + + ce->ce_flags |= CE_MATCHED; + (*count)++; + +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if (ce_uptodate(dup) && + dup->ce_stat_data.sd_ino == st->st_ino) { + dup->ce_flags |= CE_MATCHED; + (*count)++; + break; + } + } +#endif +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +484,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_duplicate_entries(state, ce, &st); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/unpack-trees.c b/unpack-trees.c index f9efee0836..d4fece913c 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -344,12 +344,20 @@ static int check_updates(struct unpack_trees_options *o) struct index_state *index = &o->result; struct checkout state = CHECKOUT_INIT; int i; + int nr_duplicates = 0; state.force = 1; state.quiet = 1; state.refresh_cache = 1; state.istate = index; + if (o->clone) { + state.clone = 1; + state.nr_duplicates = &nr_duplicates; + for (i = 0; i < index->cache_nr; i++) + index->cache[i]->ce_flags &= ~CE_MATCHED; + } + progress = get_progress(o); if (o->update) @@ -414,6 +422,20 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(&state); if (o->update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); + + if (o->clone && state.nr_duplicates) { + warning(_("the following paths have collided and only one from the same\n" + "colliding group is in the working tree:\n")); + for (i = 0; i < index->cache_nr; i++) { + struct cache_entry *ce = index->cache[i]; + + if (!(ce->ce_flags & CE_MATCHED)) + continue; + fprintf(stderr, " '%s'\n", ce->name); + ce->ce_flags &= ~CE_MATCHED; + } + } + return errs != 0; } diff --git a/unpack-trees.h b/unpack-trees.h index c2b434c606..d940f1c5c2 100644 --- a/unpack-trees.h +++ b/unpack-trees.h @@ -42,6 +42,7 @@ struct unpack_trees_options { unsigned int reset, merge, update, + clone, index_only, nontrivial_merge, trivial_merges_only, -- 2.18.0.915.gd571298aae ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems 2018-08-10 15:36 ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy @ 2018-08-10 16:42 ` Junio C Hamano 2018-08-11 10:09 ` SZEDER Gábor 1 sibling, 0 replies; 96+ messages in thread From: Junio C Hamano @ 2018-08-10 16:42 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, git, newren, pawelparuzel95, peff, sandals, tboegi Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > +static void mark_duplicate_entries(const struct checkout *state, > + struct cache_entry *ce, struct stat *st) > +{ > + int i; > + int *count = state->nr_duplicates; > + > + if (!count) > + BUG("state->nr_duplicates must not be NULL"); > + > + ce->ce_flags |= CE_MATCHED; > + (*count)++; We tried to check out ce and found out that the path is already occupied, whose stat data is in st. We increment (*count)++ to signal the fact that ce itself had trouble. > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > + for (i = 0; i < state->istate->cache_nr; i++) { Do we want to count up all the way? IOW, don't we want to stop when we see the ce itself? > + struct cache_entry *dup = state->istate->cache[i]; > + > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > + continue; > + > + if (ce_uptodate(dup) && > + dup->ce_stat_data.sd_ino == st->st_ino) { > + dup->ce_flags |= CE_MATCHED; Is checking ce_uptodate(dup) done to check that the 'dup' is a freshly checked out entry? I'd like to understand why it is necessary, especially beause you know you only call this function in the o->clone codepath, so everything ought to be fresh. I agree that the check you have above is sufficient on inum capable systems. I also suspect that Windows folks, if they really wish to report which other path(s) are colliding with ce, would want to replace this whole loop with a platform specific helper function, not just the condition in the above "if" statement [*1*], so what we see here should be sufficient for now. Side note #1. It is sufficient for inum capable systems to look at 'dup' and 'st' to cheaply identify collision, but it may be more efficient for other systems to look at ce and the previous entries of the index, which allows them to use platform specific API that knows case-folding and other pathname munging specifics. Going that way lets them minimally emulate 'struct stat', which may be expensive to fill. > + (*count)++; And then we increment (*count)++ for this _other_ one that collided with ce. > + break; And we leave the loop. There may be somebody else that collided when 'dup' was checked out, making this triple collision, but in such a case, we won't be coming this far in this loop (i.e. we would have continued on this 'dup' as it is marked as CE_MATCHED), so there is no reason to continue looking for further collision with 'ce' here. So nr_duplicates kept track by these (*count)++ will indicate number of paths involved in any collision correctly, I think. Makes sense. > + } > + } > +#endif > +} > + > /* > * Write the contents from ce out to the working tree. > * > @@ -455,6 +484,9 @@ int checkout_entry(struct cache_entry *ce, > return -1; > } > > + if (state->clone) > + mark_duplicate_entries(state, ce, &st); > + > /* > * We unlink the old file, to get the new one with the > * right permissions (including umask, which is nasty > diff --git a/unpack-trees.c b/unpack-trees.c > index f9efee0836..d4fece913c 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -344,12 +344,20 @@ static int check_updates(struct unpack_trees_options *o) > struct index_state *index = &o->result; > struct checkout state = CHECKOUT_INIT; > int i; > + int nr_duplicates = 0; > > state.force = 1; > state.quiet = 1; > state.refresh_cache = 1; > state.istate = index; > > + if (o->clone) { > + state.clone = 1; > + state.nr_duplicates = &nr_duplicates; > + for (i = 0; i < index->cache_nr; i++) > + index->cache[i]->ce_flags &= ~CE_MATCHED; > + } > + > progress = get_progress(o); > > if (o->update) > @@ -414,6 +422,20 @@ static int check_updates(struct unpack_trees_options *o) > errs |= finish_delayed_checkout(&state); > if (o->update) > git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); > + > + if (o->clone && state.nr_duplicates) { Does state.nr_duplicates have to be a pointer to int? I am wondering if state.clone bit is sufficient to protect codepaths that needs to spend cycles to keep track of that number, in which case we can do without local variable nr_duplicates here, and the set-up code in the previous hunk not to store the pointer to it in state.nr_duplicates (instead, that field itself can become an int). > + warning(_("the following paths have collided and only one from the same\n" > + "colliding group is in the working tree:\n")); As we are not grouping (and do not particularly see the need to group), perhaps we can do without mentioning group in the warning? e.g. some of the following may have been checked out incorrectly due to limitation of the filesystem like case insensitivity. > + for (i = 0; i < index->cache_nr; i++) { > + struct cache_entry *ce = index->cache[i]; > + > + if (!(ce->ce_flags & CE_MATCHED)) > + continue; > + fprintf(stderr, " '%s'\n", ce->name); > + ce->ce_flags &= ~CE_MATCHED; > + } > + } ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems 2018-08-10 15:36 ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-08-10 16:42 ` Junio C Hamano @ 2018-08-11 10:09 ` SZEDER Gábor 2018-08-11 13:16 ` Duy Nguyen 1 sibling, 1 reply; 96+ messages in thread From: SZEDER Gábor @ 2018-08-11 10:09 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: SZEDER Gábor, git, git, gitster, newren, pawelparuzel95, peff, sandals, tboegi > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. Even though this patch talks about case sensitivity, the > patch makes no assumption about folding rules by the filesystem. It > simply observes that if an entry has been already checked out at clone > time when we're about to write a new path, some folding rules are > behind this. > > This patch is tested with vim-colorschemes repository on a JFS partition > with case insensitive support on Linux. This repository has two files > darkBlue.vim and darkblue.vim. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> This patch makes 'clone http repository' in 't5551-http-fetch-smart.sh' fail with: --- exp 2018-08-11 02:29:45.216641851 +0000 +++ actual.smudged 2018-08-11 02:29:45.264642318 +0000 @@ -15,3 +15,5 @@ < Pragma: no-cache < Cache-Control: no-cache, max-age=0, must-revalidate < Content-Type: application/x-git-upload-pack-result +> warning: the following paths have collided and only one from the same +> colliding group is in the working tree: This highlights a few issues: - This test runs GIT_TRACE_CURL=true git clone --quiet <URL> 2>err i.e. the curl trace and any errors or warnings from 'git clone' end up in the same file. This test then removes a lot of uninteresting headers before the comparison with 'test_cmp', but the warning remains and then triggers the test failure. Several other tests run a command like this, but those don't use 'test_cmp', but only grep the output to verify the presence or absence of certain headers. I'm inclined to think that it would be prudent to change all these tests to send the curl trace to a dedicated file (and then '--quiet' can be removed as well). Though, arguably, had that been already the case, this test wouldn't have failed, and we probably wouldn't have noticed that something is wrong. - But what triggered this warning in the first place? 'git clone' didn't print anything after the that warning, even when I re-run the test with that '--quiet' option removed. Furthermore, the cloned repository contains a single file, so there could be no case/folding collision among multiple files. I also notice that this patch doesn't add any tests... :) - I didn't understand this warning, I had to read the corresponding commit message to figure out what it's all about. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems 2018-08-11 10:09 ` SZEDER Gábor @ 2018-08-11 13:16 ` Duy Nguyen 2018-08-13 16:55 ` Junio C Hamano 0 siblings, 1 reply; 96+ messages in thread From: Duy Nguyen @ 2018-08-11 13:16 UTC (permalink / raw) To: SZEDER Gábor Cc: Jeff Hostetler, Git Mailing List, Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, Torsten Bögershausen On Sat, Aug 11, 2018 at 12:09 PM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > > > Paths that only differ in case work fine in a case-sensitive > > filesystems, but if those repos are cloned in a case-insensitive one, > > you'll get problems. The first thing to notice is "git status" will > > never be clean with no indication what exactly is "dirty". > > > > This patch helps the situation a bit by pointing out the problem at > > clone time. Even though this patch talks about case sensitivity, the > > patch makes no assumption about folding rules by the filesystem. It > > simply observes that if an entry has been already checked out at clone > > time when we're about to write a new path, some folding rules are > > behind this. > > > > This patch is tested with vim-colorschemes repository on a JFS partition > > with case insensitive support on Linux. This repository has two files > > darkBlue.vim and darkblue.vim. > > > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > > This patch makes 'clone http repository' in > 't5551-http-fetch-smart.sh' fail with: > > --- exp 2018-08-11 02:29:45.216641851 +0000 > +++ actual.smudged 2018-08-11 02:29:45.264642318 +0000 > @@ -15,3 +15,5 @@ > < Pragma: no-cache > < Cache-Control: no-cache, max-age=0, must-revalidate > < Content-Type: application/x-git-upload-pack-result > +> warning: the following paths have collided and only one from the same > +> colliding group is in the working tree: I was careless and checked the wrong variable (should have checked nr_duplicates not state.nr_duplicates; the second is a pointer). So we always get this warning (and with no following list of files) > I also notice that this patch doesn't add any tests... :) This is platform specific and I was to be frank a bit lazy. Will consider adding a test with CASE_INSENSITIVE_FS after this. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems 2018-08-11 13:16 ` Duy Nguyen @ 2018-08-13 16:55 ` Junio C Hamano 2018-08-13 17:12 ` Duy Nguyen 0 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-13 16:55 UTC (permalink / raw) To: Duy Nguyen Cc: SZEDER Gábor, Jeff Hostetler, Git Mailing List, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, Torsten Bögershausen Duy Nguyen <pclouds@gmail.com> writes: > I was careless and checked the wrong variable (should have checked > nr_duplicates not state.nr_duplicates; the second is a pointer). So we > always get this warning (and with no following list of files) Heh, does that bug go away if you got rid of the pointer-ness of the field and store the value directly in there? >> I also notice that this patch doesn't add any tests... :) > > This is platform specific and I was to be frank a bit lazy. Will > consider adding a test with CASE_INSENSITIVE_FS after this. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems 2018-08-13 16:55 ` Junio C Hamano @ 2018-08-13 17:12 ` Duy Nguyen 0 siblings, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-13 17:12 UTC (permalink / raw) To: Junio C Hamano Cc: SZEDER Gábor, Jeff Hostetler, Git Mailing List, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, Torsten Bögershausen On Mon, Aug 13, 2018 at 6:55 PM Junio C Hamano <gitster@pobox.com> wrote: > > Duy Nguyen <pclouds@gmail.com> writes: > > > I was careless and checked the wrong variable (should have checked > > nr_duplicates not state.nr_duplicates; the second is a pointer). So we > > always get this warning (and with no following list of files) > > Heh, does that bug go away if you got rid of the pointer-ness of the > field and store the value directly in there? You mean replacing the pointer with a real counter in struct checkout? That would not work (it was my first option) because struct checkout is passed around as a const struct. entry.c code is not allowed to make any updates there. So I got rid of both "nr_duplicates" and just count again at the bottom of check_updates(). It's not that expensive and it simplifies the code. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 0/1] clone: warn on colidding entries on checkout 2018-08-10 15:36 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy 2018-08-10 15:36 ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy @ 2018-08-10 16:12 ` Junio C Hamano 2018-08-12 9:07 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2 siblings, 0 replies; 96+ messages in thread From: Junio C Hamano @ 2018-08-10 16:12 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, git, newren, pawelparuzel95, peff, sandals, tboegi Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > There are lots of suggestions on optimizing this stuff, but since this > problem does not affect me to begin with, I'm reluctant to make more > changes and going to stay simple, stupid and slow. I could continue to > do small updates if needed. But for bigger changes, consider this > patch dropped by me. > > v3 now uses inode on UNIXy platforms for checking colliding items. I > still don't try to separate colliding groups because it should be > quite obvious once you look at the colliding list (and most of the > time I suspect we only have one or two groups). I think that design decision is fine. We can extend it later if needed, but I would not be surprised if what you have here is sufficient. Another possible follow-up in the future may be to encapsulate the "I have a cache-entry 'dup', and stat data 'st' taken for a path in the working tree. Does it look likely that the latter is the result of checking out the former?" logic, which you currently has a hard-coded if() statement condition, into a helper function and make its implementation platform dependent. ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-10 15:36 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy 2018-08-10 15:36 ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-08-10 16:12 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Junio C Hamano @ 2018-08-12 9:07 ` Nguyễn Thái Ngọc Duy 2018-08-13 15:32 ` Jeff Hostetler ` (3 more replies) 2 siblings, 4 replies; 96+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2018-08-12 9:07 UTC (permalink / raw) To: pclouds Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, tboegi, SZEDER Gábor Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. This patch is tested with vim-colorschemes repository on a JFS partition with case insensitive support on Linux. This repository has two files darkBlue.vim and darkblue.vim. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- v4 removes nr_duplicates (and fixes that false warning Szeder reported). It also hints about case insensitivity as a cause of problem because it's most likely the case when this warning shows up. builtin/clone.c | 1 + cache.h | 1 + entry.c | 28 ++++++++++++++++++++++++++++ t/t5601-clone.sh | 8 +++++++- unpack-trees.c | 28 ++++++++++++++++++++++++++++ unpack-trees.h | 1 + 6 files changed, 66 insertions(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..0702b0e9d0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) memset(&opts, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = &the_index; diff --git a/cache.h b/cache.h index 8b447652a7..6d6138f4f1 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, + clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..c70340df8e 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_colliding_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i; + + ce->ce_flags |= CE_MATCHED; + +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) + break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if (dup->ce_stat_data.sd_ino == st->st_ino) { + dup->ce_flags |= CE_MATCHED; + break; + } + } +#endif +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_colliding_entries(state, ce, &st); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 0b62037744..f2eb73bc74 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' git hash-object -w -t tree --stdin) && c=$(git commit-tree -m bogus $t) && git update-ref refs/heads/bogus $c && - git clone -b bogus . bogus + git clone -b bogus . bogus 2>warning ) ' +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' + grep X icasefs/warning && + grep x icasefs/warning && + test_i18ngrep "the following paths have collided" icasefs/warning +' + partial_clone () { SERVER="$1" && URL="$2" && diff --git a/unpack-trees.c b/unpack-trees.c index cd0680f11e..443df048ef 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o) state.refresh_cache = 1; state.istate = index; + if (o->clone) { + state.clone = 1; + for (i = 0; i < index->cache_nr; i++) + index->cache[i]->ce_flags &= ~CE_MATCHED; + } + progress = get_progress(o); if (o->update) @@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(&state); if (o->update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); + + if (o->clone) { + int printed_warning = 0; + + for (i = 0; i < index->cache_nr; i++) { + struct cache_entry *ce = index->cache[i]; + + if (!(ce->ce_flags & CE_MATCHED)) + continue; + + if (!printed_warning) { + warning(_("the following paths have collided (e.g. case-sensitive paths\n" + "on a case-insensitive filesystem) and only one from the same\n" + "colliding group is in the working tree:\n")); + printed_warning = 1; + } + + fprintf(stderr, " '%s'\n", ce->name); + ce->ce_flags &= ~CE_MATCHED; + } + } + return errs != 0; } diff --git a/unpack-trees.h b/unpack-trees.h index c2b434c606..d940f1c5c2 100644 --- a/unpack-trees.h +++ b/unpack-trees.h @@ -42,6 +42,7 @@ struct unpack_trees_options { unsigned int reset, merge, update, + clone, index_only, nontrivial_merge, trivial_merges_only, -- 2.18.0.1004.g6639190530 ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-12 9:07 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy @ 2018-08-13 15:32 ` Jeff Hostetler 2018-08-13 17:18 ` Junio C Hamano ` (2 subsequent siblings) 3 siblings, 0 replies; 96+ messages in thread From: Jeff Hostetler @ 2018-08-13 15:32 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, gitster, newren, pawelparuzel95, peff, sandals, tboegi, SZEDER Gábor On 8/12/2018 5:07 AM, Nguyễn Thái Ngọc Duy wrote: > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. Even though this patch talks about case sensitivity, the > patch makes no assumption about folding rules by the filesystem. It > simply observes that if an entry has been already checked out at clone > time when we're about to write a new path, some folding rules are > behind this. > > This patch is tested with vim-colorschemes repository on a JFS partition > with case insensitive support on Linux. This repository has two files > darkBlue.vim and darkblue.vim. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > v4 removes nr_duplicates (and fixes that false warning Szeder > reported). It also hints about case insensitivity as a cause of > problem because it's most likely the case when this warning shows up. > > builtin/clone.c | 1 + > cache.h | 1 + > entry.c | 28 ++++++++++++++++++++++++++++ > t/t5601-clone.sh | 8 +++++++- > unpack-trees.c | 28 ++++++++++++++++++++++++++++ > unpack-trees.h | 1 + > 6 files changed, 66 insertions(+), 1 deletion(-) > > diff --git a/builtin/clone.c b/builtin/clone.c > index 5c439f1394..0702b0e9d0 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) > memset(&opts, 0, sizeof opts); > opts.update = 1; > opts.merge = 1; > + opts.clone = 1; > opts.fn = oneway_merge; > opts.verbose_update = (option_verbosity >= 0); > opts.src_index = &the_index; > diff --git a/cache.h b/cache.h > index 8b447652a7..6d6138f4f1 100644 > --- a/cache.h > +++ b/cache.h > @@ -1455,6 +1455,7 @@ struct checkout { > unsigned force:1, > quiet:1, > not_new:1, > + clone:1, > refresh_cache:1; > }; > #define CHECKOUT_INIT { NULL, "" } > diff --git a/entry.c b/entry.c > index b5d1d3cf23..c70340df8e 100644 > --- a/entry.c > +++ b/entry.c > @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) > return lstat(path, st); > } > > +static void mark_colliding_entries(const struct checkout *state, > + struct cache_entry *ce, struct stat *st) > +{ > + int i; > + > + ce->ce_flags |= CE_MATCHED; > + > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > + for (i = 0; i < state->istate->cache_nr; i++) { > + struct cache_entry *dup = state->istate->cache[i]; > + > + if (dup == ce) > + break; > + > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > + continue; > + > + if (dup->ce_stat_data.sd_ino == st->st_ino) { > + dup->ce_flags |= CE_MATCHED; > + break; > + } > + } > +#endif > +} > + > /* > * Write the contents from ce out to the working tree. > * > @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce, > return -1; > } > > + if (state->clone) > + mark_colliding_entries(state, ce, &st); > + > /* > * We unlink the old file, to get the new one with the > * right permissions (including umask, which is nasty > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index 0b62037744..f2eb73bc74 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' > git hash-object -w -t tree --stdin) && > c=$(git commit-tree -m bogus $t) && > git update-ref refs/heads/bogus $c && > - git clone -b bogus . bogus > + git clone -b bogus . bogus 2>warning > ) > ' > > +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' > + grep X icasefs/warning && > + grep x icasefs/warning && > + test_i18ngrep "the following paths have collided" icasefs/warning > +' > + > partial_clone () { > SERVER="$1" && > URL="$2" && > diff --git a/unpack-trees.c b/unpack-trees.c > index cd0680f11e..443df048ef 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o) > state.refresh_cache = 1; > state.istate = index; > > + if (o->clone) { > + state.clone = 1; > + for (i = 0; i < index->cache_nr; i++) > + index->cache[i]->ce_flags &= ~CE_MATCHED; > + } > + > progress = get_progress(o); > > if (o->update) > @@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o) > errs |= finish_delayed_checkout(&state); > if (o->update) > git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); > + > + if (o->clone) { > + int printed_warning = 0; > + > + for (i = 0; i < index->cache_nr; i++) { > + struct cache_entry *ce = index->cache[i]; > + > + if (!(ce->ce_flags & CE_MATCHED)) > + continue; > + > + if (!printed_warning) { > + warning(_("the following paths have collided (e.g. case-sensitive paths\n" > + "on a case-insensitive filesystem) and only one from the same\n" > + "colliding group is in the working tree:\n")); > + printed_warning = 1; > + } > + > + fprintf(stderr, " '%s'\n", ce->name); > + ce->ce_flags &= ~CE_MATCHED; > + } > + } > + If I'm reading this correctly, on Linux and friends, you'll print the names of the files where the collision was detected and the paths of any peers found from the inum matching. And because of the #ifdef'ing on Windows, we'll just get the former (at least for now). That sounds fine. Thanks Jeff > return errs != 0; > } > > diff --git a/unpack-trees.h b/unpack-trees.h > index c2b434c606..d940f1c5c2 100644 > --- a/unpack-trees.h > +++ b/unpack-trees.h > @@ -42,6 +42,7 @@ struct unpack_trees_options { > unsigned int reset, > merge, > update, > + clone, > index_only, > nontrivial_merge, > trivial_merges_only, > ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-12 9:07 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-08-13 15:32 ` Jeff Hostetler @ 2018-08-13 17:18 ` Junio C Hamano 2018-08-15 19:08 ` Torsten Bögershausen 2018-08-17 16:16 ` [PATCH v5] " Nguyễn Thái Ngọc Duy 3 siblings, 0 replies; 96+ messages in thread From: Junio C Hamano @ 2018-08-13 17:18 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, git, newren, pawelparuzel95, peff, sandals, tboegi, SZEDER Gábor Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. Even though this patch talks about case sensitivity, the > patch makes no assumption about folding rules by the filesystem. It > simply observes that if an entry has been already checked out at clone > time when we're about to write a new path, some folding rules are > behind this. > > This patch is tested with vim-colorschemes repository on a JFS partition > with case insensitive support on Linux. This repository has two files > darkBlue.vim and darkblue.vim. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > v4 removes nr_duplicates (and fixes that false warning Szeder > reported). It also hints about case insensitivity as a cause of > problem because it's most likely the case when this warning shows up. Ah, you no longer have that counter and the pointer to the counter, as you do not even report how many paths collide ;-) Makes sense. > diff --git a/entry.c b/entry.c > index b5d1d3cf23..c70340df8e 100644 > --- a/entry.c > +++ b/entry.c > @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) > return lstat(path, st); > } > > +static void mark_colliding_entries(const struct checkout *state, > + struct cache_entry *ce, struct stat *st) > +{ > + int i; > + > + ce->ce_flags |= CE_MATCHED; > + > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > + for (i = 0; i < state->istate->cache_nr; i++) { > + struct cache_entry *dup = state->istate->cache[i]; > + > + if (dup == ce) > + break; > + > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > + continue; > + > + if (dup->ce_stat_data.sd_ino == st->st_ino) { > + dup->ce_flags |= CE_MATCHED; > + break; > + } > + } > +#endif > +} OK. The whole loop might want to become a call to a helper function whose implementation is platform dependent in the future, but that should be kept outside the topic and left for a future enhancement. > @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce, > return -1; > } > > + if (state->clone) > + mark_colliding_entries(state, ce, &st); OK. I haven't carefully looked at the codepath but is it more involved to instead *not* check out this ce (and leave the working tree file that is already there for another path in the index alone)? I suspect it won't be as simple as if (state->clone) { mark_colliding_entries(state, ce, &st); return -1; } but I think it would give much more pleasant end-user experience if we can do so, especially on GIT_WINDOWS_NATIVE. I would imagine that the first thing those who see the message "foo.txt have collided with something else we are not telling you" would want to do is to see what "foo.txt" contains---and it may be obvious to human that it contains the contents intended for "Foo.txt" instead, if we somehow refrained from overwriting it here, which would compensate for the lack of "this is the other path that collided with your file." It is perfectly an acceptable answer if it is "I looked at it, and it is a lot more involved as there are these fallouts from the codepaths that the control flows later from this point you haven't checked and considered---let's keep overwriting, it is much safer." > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index 0b62037744..f2eb73bc74 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' > git hash-object -w -t tree --stdin) && > c=$(git commit-tree -m bogus $t) && > git update-ref refs/heads/bogus $c && > - git clone -b bogus . bogus > + git clone -b bogus . bogus 2>warning > ) > ' > > +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' > + grep X icasefs/warning && > + grep x icasefs/warning && > + test_i18ngrep "the following paths have collided" icasefs/warning > +' > + Ah, I was wondering why possible error message needs to be hidden in the previous test---it is not hiding; it is capturing to look for the paths in the message. Makes sense. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-12 9:07 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-08-13 15:32 ` Jeff Hostetler 2018-08-13 17:18 ` Junio C Hamano @ 2018-08-15 19:08 ` Torsten Bögershausen 2018-08-15 19:35 ` Duy Nguyen 2018-08-15 19:38 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano 2018-08-17 16:16 ` [PATCH v5] " Nguyễn Thái Ngọc Duy 3 siblings, 2 replies; 96+ messages in thread From: Torsten Bögershausen @ 2018-08-15 19:08 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, SZEDER Gábor On Sun, Aug 12, 2018 at 11:07:14AM +0200, Nguyễn Thái Ngọc Duy wrote: > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. Even though this patch talks about case sensitivity, the > patch makes no assumption about folding rules by the filesystem. It > simply observes that if an entry has been already checked out at clone > time when we're about to write a new path, some folding rules are > behind this. > > This patch is tested with vim-colorschemes repository on a JFS partition > with case insensitive support on Linux. This repository has two files > darkBlue.vim and darkblue.vim. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > v4 removes nr_duplicates (and fixes that false warning Szeder > reported). It also hints about case insensitivity as a cause of > problem because it's most likely the case when this warning shows up. > > builtin/clone.c | 1 + > cache.h | 1 + > entry.c | 28 ++++++++++++++++++++++++++++ > t/t5601-clone.sh | 8 +++++++- > unpack-trees.c | 28 ++++++++++++++++++++++++++++ > unpack-trees.h | 1 + > 6 files changed, 66 insertions(+), 1 deletion(-) > > diff --git a/builtin/clone.c b/builtin/clone.c > index 5c439f1394..0702b0e9d0 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) > memset(&opts, 0, sizeof opts); > opts.update = 1; > opts.merge = 1; > + opts.clone = 1; > opts.fn = oneway_merge; > opts.verbose_update = (option_verbosity >= 0); > opts.src_index = &the_index; > diff --git a/cache.h b/cache.h > index 8b447652a7..6d6138f4f1 100644 > --- a/cache.h > +++ b/cache.h > @@ -1455,6 +1455,7 @@ struct checkout { > unsigned force:1, > quiet:1, > not_new:1, > + clone:1, > refresh_cache:1; > }; > #define CHECKOUT_INIT { NULL, "" } > diff --git a/entry.c b/entry.c > index b5d1d3cf23..c70340df8e 100644 > --- a/entry.c > +++ b/entry.c > @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) > return lstat(path, st); > } > > +static void mark_colliding_entries(const struct checkout *state, > + struct cache_entry *ce, struct stat *st) > +{ > + int i; > + > + ce->ce_flags |= CE_MATCHED; > + > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > + for (i = 0; i < state->istate->cache_nr; i++) { > + struct cache_entry *dup = state->istate->cache[i]; > + > + if (dup == ce) > + break; > + > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > + continue; > + Should the following be protected by core.checkstat ? if (check_stat) { > + if (dup->ce_stat_data.sd_ino == st->st_ino) { > + dup->ce_flags |= CE_MATCHED; > + break; > + } > + } > +#endif Another thing is that we switch of the ASCII case-folding-detection-logic off for Windows users, even if we otherwise rely on icase. I think we can use fspathcmp() as a fallback. when inodes fail, because we may be on a network file system. (I don't have a test setup at the moment, but what happens with inodes when a Windows machine exports a share to Linux or Mac ?) Is there a chance to get the fspathcmp() back, like this ? static void mark_colliding_entries(const struct checkout *state, struct cache_entry *ce, struct stat *st) { int i; ce->ce_flags |= CE_MATCHED; for (i = 0; i < state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; int folded = 0; if (dup == ce) break; if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) continue; if (!fspathcmp(dup->name, ce->name)) folded = 1; #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino)) folded = 1; #endif if (folded) { dup->ce_flags |= CE_MATCHED; break; } } } ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-15 19:08 ` Torsten Bögershausen @ 2018-08-15 19:35 ` Duy Nguyen 2018-08-16 15:56 ` [PATCH] config.txt: clarify core.checkStat = minimal Nguyễn Thái Ngọc Duy 2018-08-15 19:38 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano 1 sibling, 1 reply; 96+ messages in thread From: Duy Nguyen @ 2018-08-15 19:35 UTC (permalink / raw) To: Torsten Bögershausen Cc: Jeff Hostetler, Git Mailing List, Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor On Wed, Aug 15, 2018 at 9:08 PM Torsten Bögershausen <tboegi@web.de> wrote: > > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > > + for (i = 0; i < state->istate->cache_nr; i++) { > > + struct cache_entry *dup = state->istate->cache[i]; > > + > > + if (dup == ce) > > + break; > > + > > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > > + continue; > > + > > Should the following be protected by core.checkstat ? > if (check_stat) { Good catch! st_ino is ignored if core.checkStat is false. I will probably send a separate patch to add more details to config.txt about this key. > > + if (dup->ce_stat_data.sd_ino == st->st_ino) { > > + dup->ce_flags |= CE_MATCHED; > > + break; > > + } > > + } > > +#endif > > Another thing is that we switch of the ASCII case-folding-detection-logic > off for Windows users, even if we otherwise rely on icase. > I think we can use fspathcmp() as a fallback. when inodes fail, > because we may be on a network file system. I admit I did not think about network file system. Will spend some time (and hopefully not on nfs kernel code) on it. For falling back on fspathcmp even on Windows, is it really safe? I'm on Linux and never have to deal with this issue to have any experience. It does sound good though because it should be a subset for any "weird" filesystems out there. > (I don't have a test setup at the moment, but what happens with inodes > when a Windows machine exports a share to Linux or Mac ?) > > Is there a chance to get the fspathcmp() back, like this ? > > static void mark_colliding_entries(const struct checkout *state, > struct cache_entry *ce, struct stat *st) > { > int i; > ce->ce_flags |= CE_MATCHED; > > for (i = 0; i < state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > int folded = 0; > > if (dup == ce) > break; > > if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > continue; > > if (!fspathcmp(dup->name, ce->name)) > folded = 1; > > #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino)) > folded = 1; > #endif > if (folded) { > dup->ce_flags |= CE_MATCHED; > break; > } > } > } > -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH] config.txt: clarify core.checkStat = minimal 2018-08-15 19:35 ` Duy Nguyen @ 2018-08-16 15:56 ` Nguyễn Thái Ngọc Duy 2018-08-16 17:01 ` Junio C Hamano 0 siblings, 1 reply; 96+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2018-08-16 15:56 UTC (permalink / raw) To: pclouds Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi The description of this key does not really tell what 'minimal' mode checks exactly. More information about this mode can be found in the commit message of c08e4d5b5c (Enable minimal stat checking - 2013-01-22). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- Documentation/config.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index fd8d27e761..5c41314dd5 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -466,6 +466,8 @@ core.checkStat:: and work tree. The user can set this to 'default' or 'minimal'. Default (or explicitly 'default'), is to check all fields, including the sub-second part of mtime and ctime. + 'minimal' only checks size and the whole second part of mtime + and ctime. core.quotePath:: Commands that output paths (e.g. 'ls-files', 'diff'), will -- 2.18.0.1004.g6639190530 ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH] config.txt: clarify core.checkStat = minimal 2018-08-16 15:56 ` [PATCH] config.txt: clarify core.checkStat = minimal Nguyễn Thái Ngọc Duy @ 2018-08-16 17:01 ` Junio C Hamano 2018-08-16 18:19 ` Duy Nguyen 0 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-16 17:01 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, git, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > The description of this key does not really tell what 'minimal' mode > checks exactly. More information about this mode can be found in the > commit message of c08e4d5b5c (Enable minimal stat checking - > 2013-01-22). > While I agree that we need to do _something_, I am not sure if this change adds sufficient value. I _think_ those who wonder if they want to configure this want to know what are _not_ looked at (relative to the "default") more than what are _still_ looked at, partly because the description of "default" is already bogus and says "check all fields", which is horrible for two reasons. It is unclear what are in "all" fields in the first place, and also we do not look at all fields (e.g. we do not look at atime for obvious reasons). So perhaps When this configuration variable is missing or is set to `default`, many fields in the stat structure are checked to detect if a file has been modified since Git looked at it. Among these fields, when this configuration variable is set to `minimal`, sub-second part of mtime and ctime, the uid and gid of the owner of the file, the inode number (and the device number, if Git was compiled to use it), are excluded from the check, leaving only the whole-second part of mtime (and ctime, if `core.trustCtime` is set) and the filesize to be checked. or something? > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > Documentation/config.txt | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index fd8d27e761..5c41314dd5 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -466,6 +466,8 @@ core.checkStat:: > and work tree. The user can set this to 'default' or > 'minimal'. Default (or explicitly 'default'), is to check > all fields, including the sub-second part of mtime and ctime. > + 'minimal' only checks size and the whole second part of mtime > + and ctime. > > core.quotePath:: > Commands that output paths (e.g. 'ls-files', 'diff'), will ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH] config.txt: clarify core.checkStat = minimal 2018-08-16 17:01 ` Junio C Hamano @ 2018-08-16 18:19 ` Duy Nguyen 2018-08-16 22:29 ` Junio C Hamano 2018-08-17 15:26 ` Junio C Hamano 0 siblings, 2 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-16 18:19 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff Hostetler, Git Mailing List, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor, Torsten Bögershausen On Thu, Aug 16, 2018 at 7:01 PM Junio C Hamano <gitster@pobox.com> wrote: > > Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > > > The description of this key does not really tell what 'minimal' mode > > checks exactly. More information about this mode can be found in the > > commit message of c08e4d5b5c (Enable minimal stat checking - > > 2013-01-22). > > > > While I agree that we need to do _something_, I am not sure if this > change adds sufficient value. I _think_ those who wonder if they > want to configure this want to know what are _not_ looked at > (relative to the "default") more than what are _still_ looked at, > partly because the description of "default" is already bogus and > says "check all fields", which is horrible for two reasons. It is > unclear what are in "all" fields in the first place, and also we do > not look at all fields (e.g. we do not look at atime for obvious > reasons). > > So perhaps > > When this configuration variable is missing or is set to > `default`, many fields in the stat structure are checked to > detect if a file has been modified since Git looked at it. > Among these fields, when this configuration variable is set > to `minimal`, sub-second part of mtime and ctime, the uid > and gid of the owner of the file, the inode number (and the > device number, if Git was compiled to use it), are excluded > from the check, leaving only the whole-second part of mtime > (and ctime, if `core.trustCtime` is set) and the filesize to > be checked. > > or something? Perfect. I could wrap it in a patch, but I feel you should take authorship for that one. I'll leave it to you to create this commit. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH] config.txt: clarify core.checkStat = minimal 2018-08-16 18:19 ` Duy Nguyen @ 2018-08-16 22:29 ` Junio C Hamano 2018-08-17 15:26 ` Junio C Hamano 1 sibling, 0 replies; 96+ messages in thread From: Junio C Hamano @ 2018-08-16 22:29 UTC (permalink / raw) To: Duy Nguyen Cc: Jeff Hostetler, Git Mailing List, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor, Torsten Bögershausen Duy Nguyen <pclouds@gmail.com> writes: > On Thu, Aug 16, 2018 at 7:01 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: >> >> > The description of this key does not really tell what 'minimal' mode >> > checks exactly. More information about this mode can be found in the >> > commit message of c08e4d5b5c (Enable minimal stat checking - >> > 2013-01-22). >> > >> >> While I agree that we need to do _something_, I am not sure if this >> change adds sufficient value. I _think_ those who wonder if they >> want to configure this want to know what are _not_ looked at >> (relative to the "default") more than what are _still_ looked at, >> partly because the description of "default" is already bogus and >> says "check all fields", which is horrible for two reasons. It is >> unclear what are in "all" fields in the first place, and also we do >> not look at all fields (e.g. we do not look at atime for obvious >> reasons). >> >> So perhaps >> >> When this configuration variable is missing or is set to >> `default`, many fields in the stat structure are checked to >> detect if a file has been modified since Git looked at it. >> Among these fields, when this configuration variable is set >> to `minimal`, sub-second part of mtime and ctime, the uid >> and gid of the owner of the file, the inode number (and the >> device number, if Git was compiled to use it), are excluded >> from the check, leaving only the whole-second part of mtime >> (and ctime, if `core.trustCtime` is set) and the filesize to >> be checked. >> >> or something? > > Perfect. I could wrap it in a patch, but I feel you should take > authorship for that one. I'll leave it to you to create this commit. If I find time after today's integration cycle, perhaps I can get to it, but not until then (so the above won't be in today's pushout). Thanks for reading it over. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH] config.txt: clarify core.checkStat = minimal 2018-08-16 18:19 ` Duy Nguyen 2018-08-16 22:29 ` Junio C Hamano @ 2018-08-17 15:26 ` Junio C Hamano 2018-08-17 15:29 ` Duy Nguyen 1 sibling, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-17 15:26 UTC (permalink / raw) To: Duy Nguyen Cc: Jeff Hostetler, Git Mailing List, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor, Torsten Bögershausen Duy Nguyen <pclouds@gmail.com> writes: > Perfect. I could wrap it in a patch, but I feel you should take > authorship for that one. I'll leave it to you to create this commit. OK, here is what I ended up with. An extra paragraph was taken from the old commit you referrred to, which is probably the only remaining part from your contribution, so the attribution has been demoted to "Helped-by", but your initiative still is appreciated very much. -- >8 -- Subject: [PATCH] config.txt: clarify core.checkStat The description of this key does not really tell what the 'minimal' mode checks and does not check. The description for the 'default' mode is not much better and just says 'all fields', which is unclear and is not even correct (e.g. we do not look at 'atime'). Spell out what are and what are not checked under the 'minimal' mode relative to the 'default' mode to help those who want to decide if they want to use the 'minimal' mode, also taking information about this mode from the commit message of c08e4d5b5c (Enable minimal stat checking - 2013-01-22). Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/config.txt | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a9..933d719137 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -449,10 +449,20 @@ core.untrackedCache:: See linkgit:git-update-index[1]. `keep` by default. core.checkStat:: - Determines which stat fields to match between the index - and work tree. The user can set this to 'default' or - 'minimal'. Default (or explicitly 'default'), is to check - all fields, including the sub-second part of mtime and ctime. + When missing or is set to `default`, many fields in the stat + structure are checked to detect if a file has been modified + since Git looked at it. When this configuration variable is + set to `minimal`, sub-second part of mtime and ctime, the + uid and gid of the owner of the file, the inode number (and + the device number, if Git was compiled to use it), are + excluded from the check among these fields, leaving only the + whole-second part of mtime (and ctime, if `core.trustCtime` + is set) and the filesize to be checked. ++ +There are implementations of Git that do not leave usable values in +some fields (e.g. JGit); by excluding these fields from the +comparison, the `minimal` mode may help interoperability when the +same repository is used by these other systems at the same time. core.quotePath:: Commands that output paths (e.g. 'ls-files', 'diff'), will -- 2.18.0-666-g63749b2dea ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH] config.txt: clarify core.checkStat = minimal 2018-08-17 15:26 ` Junio C Hamano @ 2018-08-17 15:29 ` Duy Nguyen 0 siblings, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-17 15:29 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff Hostetler, Git Mailing List, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor, Torsten Bögershausen On Fri, Aug 17, 2018 at 5:26 PM Junio C Hamano <gitster@pobox.com> wrote: > -- >8 -- > Subject: [PATCH] config.txt: clarify core.checkStat > > The description of this key does not really tell what the 'minimal' > mode checks and does not check. The description for the 'default' > mode is not much better and just says 'all fields', which is unclear > and is not even correct (e.g. we do not look at 'atime'). > > Spell out what are and what are not checked under the 'minimal' mode > relative to the 'default' mode to help those who want to decide if > they want to use the 'minimal' mode, also taking information about > this mode from the commit message of c08e4d5b5c (Enable minimal stat > checking - 2013-01-22). Looking good. This does make me want to adjust $GIT_DIR/index format to optionally not store extra fields if we know we're not going to use them. But that's a topic for another day. > Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > Documentation/config.txt | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index ab641bf5a9..933d719137 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -449,10 +449,20 @@ core.untrackedCache:: > See linkgit:git-update-index[1]. `keep` by default. > > core.checkStat:: > - Determines which stat fields to match between the index > - and work tree. The user can set this to 'default' or > - 'minimal'. Default (or explicitly 'default'), is to check > - all fields, including the sub-second part of mtime and ctime. > + When missing or is set to `default`, many fields in the stat > + structure are checked to detect if a file has been modified > + since Git looked at it. When this configuration variable is > + set to `minimal`, sub-second part of mtime and ctime, the > + uid and gid of the owner of the file, the inode number (and > + the device number, if Git was compiled to use it), are > + excluded from the check among these fields, leaving only the > + whole-second part of mtime (and ctime, if `core.trustCtime` > + is set) and the filesize to be checked. > ++ > +There are implementations of Git that do not leave usable values in > +some fields (e.g. JGit); by excluding these fields from the > +comparison, the `minimal` mode may help interoperability when the > +same repository is used by these other systems at the same time. > > core.quotePath:: > Commands that output paths (e.g. 'ls-files', 'diff'), will > -- > 2.18.0-666-g63749b2dea > -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-15 19:08 ` Torsten Bögershausen 2018-08-15 19:35 ` Duy Nguyen @ 2018-08-15 19:38 ` Junio C Hamano 2018-08-16 14:03 ` Torsten Bögershausen 1 sibling, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-15 19:38 UTC (permalink / raw) To: Torsten Bögershausen Cc: Nguyễn Thái Ngọc Duy, git, git, newren, pawelparuzel95, peff, sandals, SZEDER Gábor Torsten Bögershausen <tboegi@web.de> writes: >> + >> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ >> + for (i = 0; i < state->istate->cache_nr; i++) { >> + struct cache_entry *dup = state->istate->cache[i]; >> + >> + if (dup == ce) >> + break; >> + >> + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) >> + continue; >> + > > Should the following be protected by core.checkstat ? > if (check_stat) { I do not think such a if statement is strictly necessary. Even if check_stat tells us "when checking if a cached stat information tells us that the path may have modified, use minimum set of fields from the 'struct stat'", we still capture and update the values from the same "full" set of fields when we mark a cache entry up-to-date. So it all depends on why you are limiting with check_stat. Is it because stdev is unusable? Is it because nsec is unusable? Is it because ino is unusable? Only in the last case, paying attention to check_stat will reduce the false positive. But then you made me wonder what value check_stat has on Windows. If it is false, perhaps we do not even need the conditional compilation, which is a huge plus. >> + if (dup->ce_stat_data.sd_ino == st->st_ino) { >> + dup->ce_flags |= CE_MATCHED; >> + break; >> + } >> + } >> +#endif > > Another thing is that we switch of the ASCII case-folding-detection-logic > off for Windows users, even if we otherwise rely on icase. > I think we can use fspathcmp() as a fallback. when inodes fail, > because we may be on a network file system. > > (I don't have a test setup at the moment, but what happens with inodes > when a Windows machine exports a share to Linux or Mac ?) > > Is there a chance to get the fspathcmp() back, like this ? If fspathcmp() never gives false positives, I do not think we would mind using it like your update. False negatives are fine, as that is better than just punting the whole thing when there is no usable inum. And we do not care all that much if it is more expensive; this is an error codepath after all. And from code structure's point of view, I think it makes sense. It would be even better if we can lose the conditional compilation. Another thing we maybe want to see is if we can update the caller of this function so that we do not overwrite the earlier checkout with the data for this path. When two paths collide, we check out one of the paths without reporting (because we cannot notice), then attempt to check out the other path and report (because we do notice the previous one with lstat()). The current code then goes on and overwrites the file with the contents from the "other" path. Even if we had false negative in this loop, if we leave the contents for the earlier path while reporting the "other" path, then the user can get curious, inspect what contents the "other" path has on the filesystem, and can notice that it belongs to the (unreported--due to false negative) earlier path. > static void mark_colliding_entries(const struct checkout *state, > struct cache_entry *ce, struct stat *st) > { > int i; > ce->ce_flags |= CE_MATCHED; > > for (i = 0; i < state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > int folded = 0; > > if (dup == ce) > break; > > if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > continue; > > if (!fspathcmp(dup->name, ce->name)) > folded = 1; > > #if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino)) > folded = 1; > #endif > if (folded) { > dup->ce_flags |= CE_MATCHED; > break; > } > } > } ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-15 19:38 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano @ 2018-08-16 14:03 ` Torsten Bögershausen 2018-08-16 15:42 ` Duy Nguyen 2018-08-16 16:23 ` Junio C Hamano 0 siblings, 2 replies; 96+ messages in thread From: Torsten Bögershausen @ 2018-08-16 14:03 UTC (permalink / raw) To: Junio C Hamano Cc: Nguyễn Thái Ngọc Duy, git, git, newren, pawelparuzel95, peff, sandals, SZEDER Gábor On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote: This should answer Duys comments as well. > Torsten Bögershausen <tboegi@web.de> writes: > [snip] > > Should the following be protected by core.checkstat ? > > if (check_stat) { > > I do not think such a if statement is strictly necessary. > > Even if check_stat tells us "when checking if a cached stat > information tells us that the path may have modified, use minimum > set of fields from the 'struct stat'", we still capture and update > the values from the same "full" set of fields when we mark a cache > entry up-to-date. So it all depends on why you are limiting with > check_stat. Is it because stdev is unusable? Is it because nsec is > unusable? Is it because ino is unusable? Only in the last case, > paying attention to check_stat will reduce the false positive. > > But then you made me wonder what value check_stat has on Windows. > If it is false, perhaps we do not even need the conditional > compilation, which is a huge plus. Agreed: check_stat is 0 on Windows, and inum is allways 0 in lstat(). I was thinking about systems which don't have inodes and inum, and then generate an inum in memory, sometimes random. After a reboot or a re-mount of the file systems those ino values change. However, for the initial clone we are fine in any case. > > >> + if (dup->ce_stat_data.sd_ino == st->st_ino) { > >> + dup->ce_flags |= CE_MATCHED; > >> + break; > >> + } > >> + } > >> +#endif > > > > Another thing is that we switch of the ASCII case-folding-detection-logic > > off for Windows users, even if we otherwise rely on icase. > > I think we can use fspathcmp() as a fallback. when inodes fail, > > because we may be on a network file system. > > > > (I don't have a test setup at the moment, but what happens with inodes > > when a Windows machine exports a share to Linux or Mac ?) > > > > Is there a chance to get the fspathcmp() back, like this ? > > If fspathcmp() never gives false positives, I do not think we would > mind using it like your update. False negatives are fine, as that > is better than just punting the whole thing when there is no usable > inum. And we do not care all that much if it is more expensive; > this is an error codepath after all. > > And from code structure's point of view, I think it makes sense. It > would be even better if we can lose the conditional compilation. The current implementation of fspathcmp() does not give false positvies, and future versions should not either. All case-insentive file systems have always treated 'a-z' equal to 'A-Z'. In FAT MS/DOS there had only been uppercase letters as file names, and `type file.txt` (the equivilant to ´cat file.txt´ in *nix) simply resultet in `type FILE.TXT` Later, with VFAT and later with HPFS/NTFS a file could be stored on disk as "File.txt". From now on ´type FILE.TXT´ still worked, (and all other upper-lowercase combinations). This all is probably nothing new. The main point should be that fspathcmp() should never return a false positive, and I think we all agree on that. Now back to the compiler switch: Windows always set inum to 0 and I can't think about a situation where a file in a working tree gets inum = 0, can we use the following: static void mark_colliding_entries(const struct checkout *state, struct cache_entry *ce, struct stat *st) { int i; ce->ce_flags |= CE_MATCHED; for (i = 0; i < state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; int folded = 0; if (dup == ce) break; if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) continue; /* * Windows sets ino to 0. On other FS ino = 0 will already be * used, so we don't see it for a file in a Git working tree */ if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino)) folded = 1; /* * Fallback for NTFS and other case insenstive FS, * which don't use POSIX inums */ if (!fspathcmp(dup->name, ce->name)) folded = 1; if (folded) { dup->ce_flags |= CE_MATCHED; break; } } } > > Another thing we maybe want to see is if we can update the caller of > this function so that we do not overwrite the earlier checkout with > the data for this path. When two paths collide, we check out one of > the paths without reporting (because we cannot notice), then attempt > to check out the other path and report (because we do notice the > previous one with lstat()). The current code then goes on and overwrites > the file with the contents from the "other" path. > > Even if we had false negative in this loop, if we leave the contents > for the earlier path while reporting the "other" path, then the user > can get curious, inspect what contents the "other" path has on the > filesystem, and can notice that it belongs to the (unreported--due > to false negative) earlier path. > [snip] ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-16 14:03 ` Torsten Bögershausen @ 2018-08-16 15:42 ` Duy Nguyen 2018-08-16 16:23 ` Junio C Hamano 1 sibling, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-16 15:42 UTC (permalink / raw) To: Torsten Bögershausen Cc: Junio C Hamano, Jeff Hostetler, Git Mailing List, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor On Thu, Aug 16, 2018 at 4:03 PM Torsten Bögershausen <tboegi@web.de> wrote: > check_stat is 0 on Windows, How? check_stat is 1 by default. And "git init" does not add this key on new repos. > Now back to the compiler switch: > Windows always set inum to 0 and I can't think about a situation where > a file in a working tree gets inum = 0, can we use the following: I did consider using zero inum, but dropped it when thinking about checking all file systems. Are you sure zero inode can't be valid? -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems 2018-08-16 14:03 ` Torsten Bögershausen 2018-08-16 15:42 ` Duy Nguyen @ 2018-08-16 16:23 ` Junio C Hamano 1 sibling, 0 replies; 96+ messages in thread From: Junio C Hamano @ 2018-08-16 16:23 UTC (permalink / raw) To: Torsten Bögershausen Cc: Nguyễn Thái Ngọc Duy, git, git, newren, pawelparuzel95, peff, sandals, SZEDER Gábor Torsten Bögershausen <tboegi@web.de> writes: > check_stat is 0 on Windows, and inum is allways 0 in lstat(). > I was thinking about systems which don't have inodes and inum, > and then generate an inum in memory, sometimes random. > After a reboot or a re-mount of the file systems those ino values > change. > However, for the initial clone we are fine in any case. Yup. > Now back to the compiler switch: > Windows always set inum to 0 and I can't think about a situation where > a file in a working tree gets inum = 0, can we use the following: > > static void mark_colliding_entries(const struct checkout *state, > struct cache_entry *ce, struct stat *st) > { > int i; > ce->ce_flags |= CE_MATCHED; > > for (i = 0; i < state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > int folded = 0; > > if (dup == ce) > break; > > if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > continue; > /* > * Windows sets ino to 0. On other FS ino = 0 will already be > * used, so we don't see it for a file in a Git working tree > */ > if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino)) > folded = 1; Hmm, that is tempting but feels slightly too magical to my taste. Others may easily be able to persuade me to change my mind in this case, though. > /* > * Fallback for NTFS and other case insenstive FS, > * which don't use POSIX inums > */ > if (!fspathcmp(dup->name, ce->name)) > folded = 1; > > if (folded) { > dup->ce_flags |= CE_MATCHED; > break; > } > } > } > > >> >> Another thing we maybe want to see is if we can update the caller of >> this function so that we do not overwrite the earlier checkout with >> the data for this path. When two paths collide, we check out one of >> the paths without reporting (because we cannot notice), then attempt >> to check out the other path and report (because we do notice the >> previous one with lstat()). The current code then goes on and overwrites >> the file with the contents from the "other" path. >> >> Even if we had false negative in this loop, if we leave the contents >> for the earlier path while reporting the "other" path, then the user >> can get curious, inspect what contents the "other" path has on the >> filesystem, and can notice that it belongs to the (unreported--due >> to false negative) earlier path. >> > [snip] ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-08-12 9:07 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy ` (2 preceding siblings ...) 2018-08-15 19:08 ` Torsten Bögershausen @ 2018-08-17 16:16 ` Nguyễn Thái Ngọc Duy 2018-08-17 17:20 ` Junio C Hamano ` (2 more replies) 3 siblings, 3 replies; 96+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2018-08-17 16:16 UTC (permalink / raw) To: pclouds Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. In the case that we can't rely on filesystem (via inode number) to do this check, fall back to fspathcmp() which is not perfect but should not give false positives. This patch is tested with vim-colorschemes and Sublime-Gitignore repositories on a JFS partition with case insensitive support on Linux. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- v5 respects core.checkStat and sorts the output case-insensitively. I still don't trust magic st_ino zero, or core.checkStat being zero on Windows, so the #if condition still remains but it covers smallest area possible and I tested it by manually make it "#if 1" The fallback with fspathcmp() is only done when inode can't be trusted because strcmp is more expensive and when fspathcmp() learns more about real world in the future, it could become even more expensive. The output sorting is the result of Sublime-Gitignore repo being reported recently. It's not perfect but it should help seeing the groups in normal case. builtin/clone.c | 1 + cache.h | 1 + entry.c | 31 +++++++++++++++++++++++++++++++ t/t5601-clone.sh | 8 +++++++- unpack-trees.c | 35 +++++++++++++++++++++++++++++++++++ unpack-trees.h | 1 + 6 files changed, 76 insertions(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..0702b0e9d0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) memset(&opts, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = &the_index; diff --git a/cache.h b/cache.h index 8b447652a7..6d6138f4f1 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, + clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..8766e27255 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_colliding_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i, trust_ino = check_stat; + +#if defined(GIT_WINDOWS_NATIVE) + trust_ino = 0; +#endif + + ce->ce_flags |= CE_MATCHED; + + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) + break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || + (!trust_ino && !fspathcmp(ce->name, dup->name))) { + dup->ce_flags |= CE_MATCHED; + break; + } + } +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_colliding_entries(state, ce, &st); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 0b62037744..f2eb73bc74 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' git hash-object -w -t tree --stdin) && c=$(git commit-tree -m bogus $t) && git update-ref refs/heads/bogus $c && - git clone -b bogus . bogus + git clone -b bogus . bogus 2>warning ) ' +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' + grep X icasefs/warning && + grep x icasefs/warning && + test_i18ngrep "the following paths have collided" icasefs/warning +' + partial_clone () { SERVER="$1" && URL="$2" && diff --git a/unpack-trees.c b/unpack-trees.c index cd0680f11e..4338fee3b7 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o) state.refresh_cache = 1; state.istate = index; + if (o->clone) { + state.clone = 1; + for (i = 0; i < index->cache_nr; i++) + index->cache[i]->ce_flags &= ~CE_MATCHED; + } + progress = get_progress(o); if (o->update) @@ -423,6 +429,35 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(&state); if (o->update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); + + if (o->clone) { + struct string_list list = STRING_LIST_INIT_NODUP; + int i; + + for (i = 0; i < index->cache_nr; i++) { + struct cache_entry *ce = index->cache[i]; + + if (!(ce->ce_flags & CE_MATCHED)) + continue; + + string_list_append(&list, ce->name); + ce->ce_flags &= ~CE_MATCHED; + } + + list.cmp = fspathcmp; + string_list_sort(&list); + + if (list.nr) + warning(_("the following paths have collided (e.g. case-sensitive paths\n" + "on a case-insensitive filesystem) and only one from the same\n" + "colliding group is in the working tree:\n")); + + for (i = 0; i < list.nr; i++) + fprintf(stderr, " '%s'\n", list.items[i].string); + + string_list_clear(&list, 0); + } + return errs != 0; } diff --git a/unpack-trees.h b/unpack-trees.h index c2b434c606..d940f1c5c2 100644 --- a/unpack-trees.h +++ b/unpack-trees.h @@ -42,6 +42,7 @@ struct unpack_trees_options { unsigned int reset, merge, update, + clone, index_only, nontrivial_merge, trivial_merges_only, -- 2.18.0.1004.g6639190530 ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-08-17 16:16 ` [PATCH v5] " Nguyễn Thái Ngọc Duy @ 2018-08-17 17:20 ` Junio C Hamano 2018-08-17 18:00 ` Duy Nguyen 2018-08-17 19:46 ` Torsten Bögershausen 2018-11-19 8:20 ` Carlo Marcelo Arenas Belón 2 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-08-17 17:20 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, git, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > I still don't trust magic st_ino zero, or core.checkStat being zero > on Windows, so the #if condition still remains but it covers smallest > area possible and I tested it by manually make it "#if 1" > > The fallback with fspathcmp() is only done when inode can't be > trusted because strcmp is more expensive and when fspathcmp() learns > more about real world in the future, it could become even more > expensive. > > The output sorting is the result of Sublime-Gitignore repo being > reported recently. It's not perfect but it should help seeing the > groups in normal case. Looks small and safe. > + > + if (o->clone) { > + struct string_list list = STRING_LIST_INIT_NODUP; > + int i; > + > + for (i = 0; i < index->cache_nr; i++) { > + struct cache_entry *ce = index->cache[i]; > + > + if (!(ce->ce_flags & CE_MATCHED)) > + continue; > + > + string_list_append(&list, ce->name); > + ce->ce_flags &= ~CE_MATCHED; > + } > + > + list.cmp = fspathcmp; > + string_list_sort(&list); > + > + if (list.nr) > + warning(_("the following paths have collided (e.g. case-sensitive paths\n" > + "on a case-insensitive filesystem) and only one from the same\n" > + "colliding group is in the working tree:\n")); > + > + for (i = 0; i < list.nr; i++) > + fprintf(stderr, " '%s'\n", list.items[i].string); > + > + string_list_clear(&list, 0); I would have written the "sort, show warning, and list" all inside "if (list.nr)" block, leaving list-clear outside, which would have made the logic a bit cleaner. The reader does not have to bother thinking "ah, when list.nr==0, this is a no-op anyway" to skip them if written that way. I highly suspect that the above was written in that way to reduce the indentation level, but the right way to reduce the indentation level, if it bothers readers too much, is to make the whole thing inside the above if (o->clone) into a dedicated helper function "void report_collided_checkout(void)", I would think. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-08-17 17:20 ` Junio C Hamano @ 2018-08-17 18:00 ` Duy Nguyen 0 siblings, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-08-17 18:00 UTC (permalink / raw) To: Junio C Hamano Cc: git, git, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi On Fri, Aug 17, 2018 at 10:20:36AM -0700, Junio C Hamano wrote: > I highly suspect that the above was written in that way to reduce > the indentation level, but the right way to reduce the indentation > level, if it bothers readers too much, is to make the whole thing > inside the above if (o->clone) into a dedicated helper function > "void report_collided_checkout(void)", I would think. I read my mind. I thought of separating into a helper function too, but was not happy that the clearing CE_MATCHED in preparation for this test is in check_updates(), but the cleaning up CE_MATCHED() is in the helper function. So here is the version that separates _both_ phases into helper functions. -- 8< -- Subject: [PATCH v6] clone: report duplicate entries on case-insensitive filesystems Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. In the case that we can't rely on filesystem (via inode number) to do this check, fall back to fspathcmp() which is not perfect but should not give false positives. This patch is tested with vim-colorschemes and Sublime-Gitignore repositories on a JFS partition with case insensitive support on Linux. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- builtin/clone.c | 1 + cache.h | 1 + entry.c | 31 +++++++++++++++++++++++++++++++ t/t5601-clone.sh | 8 +++++++- unpack-trees.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ unpack-trees.h | 1 + 6 files changed, 88 insertions(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..0702b0e9d0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) memset(&opts, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = &the_index; diff --git a/cache.h b/cache.h index 8b447652a7..6d6138f4f1 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, + clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..8766e27255 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_colliding_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i, trust_ino = check_stat; + +#if defined(GIT_WINDOWS_NATIVE) + trust_ino = 0; +#endif + + ce->ce_flags |= CE_MATCHED; + + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) + break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || + (!trust_ino && !fspathcmp(ce->name, dup->name))) { + dup->ce_flags |= CE_MATCHED; + break; + } + } +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_colliding_entries(state, ce, &st); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 0b62037744..f2eb73bc74 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' git hash-object -w -t tree --stdin) && c=$(git commit-tree -m bogus $t) && git update-ref refs/heads/bogus $c && - git clone -b bogus . bogus + git clone -b bogus . bogus 2>warning ) ' +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' + grep X icasefs/warning && + grep x icasefs/warning && + test_i18ngrep "the following paths have collided" icasefs/warning +' + partial_clone () { SERVER="$1" && URL="$2" && diff --git a/unpack-trees.c b/unpack-trees.c index cd0680f11e..213da8bbb4 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -345,6 +345,46 @@ static struct progress *get_progress(struct unpack_trees_options *o) return start_delayed_progress(_("Checking out files"), total); } +static void setup_collided_checkout_detection(struct checkout *state, + struct index_state *index) +{ + int i; + + state->clone = 1; + for (i = 0; i < index->cache_nr; i++) + index->cache[i]->ce_flags &= ~CE_MATCHED; +} + +static void report_collided_checkout(struct index_state *index) +{ + struct string_list list = STRING_LIST_INIT_NODUP; + int i; + + for (i = 0; i < index->cache_nr; i++) { + struct cache_entry *ce = index->cache[i]; + + if (!(ce->ce_flags & CE_MATCHED)) + continue; + + string_list_append(&list, ce->name); + ce->ce_flags &= ~CE_MATCHED; + } + + list.cmp = fspathcmp; + string_list_sort(&list); + + if (list.nr) { + warning(_("the following paths have collided (e.g. case-sensitive paths\n" + "on a case-insensitive filesystem) and only one from the same\n" + "colliding group is in the working tree:\n")); + + for (i = 0; i < list.nr; i++) + fprintf(stderr, " '%s'\n", list.items[i].string); + } + + string_list_clear(&list, 0); +} + static int check_updates(struct unpack_trees_options *o) { unsigned cnt = 0; @@ -359,6 +399,9 @@ static int check_updates(struct unpack_trees_options *o) state.refresh_cache = 1; state.istate = index; + if (o->clone) + setup_collided_checkout_detection(&state, index); + progress = get_progress(o); if (o->update) @@ -423,6 +466,10 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(&state); if (o->update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); + + if (o->clone) + report_collided_checkout(index); + return errs != 0; } diff --git a/unpack-trees.h b/unpack-trees.h index c2b434c606..d940f1c5c2 100644 --- a/unpack-trees.h +++ b/unpack-trees.h @@ -42,6 +42,7 @@ struct unpack_trees_options { unsigned int reset, merge, update, + clone, index_only, nontrivial_merge, trivial_merges_only, -- 2.18.0.1004.g6639190530 -- 8< -- -- Duy ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-08-17 16:16 ` [PATCH v5] " Nguyễn Thái Ngọc Duy 2018-08-17 17:20 ` Junio C Hamano @ 2018-08-17 19:46 ` Torsten Bögershausen 2018-11-19 8:20 ` Carlo Marcelo Arenas Belón 2 siblings, 0 replies; 96+ messages in thread From: Torsten Bögershausen @ 2018-08-17 19:46 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev On Fri, Aug 17, 2018 at 06:16:45PM +0200, Nguyễn Thái Ngọc Duy wrote: The whole patch looks good to me. (I was just sending a different version, but your version is better :-) One minor remark, should the line warning: the following paths have collided start with a capital letter: Warning: the following paths have collided > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. Even though this patch talks about case sensitivity, the > patch makes no assumption about folding rules by the filesystem. It > simply observes that if an entry has been already checked out at clone > time when we're about to write a new path, some folding rules are > behind this. > > In the case that we can't rely on filesystem (via inode number) to do > this check, fall back to fspathcmp() which is not perfect but should > not give false positives. > > This patch is tested with vim-colorschemes and Sublime-Gitignore > repositories on a JFS partition with case insensitive support on > Linux. Now even tested under Mac OS/HFS+ [] > ' > > +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' My ambition is to run the test under Windows (both CYGWIN and native) next week, so that we can remove !MINGW and !CYGWIN ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-08-17 16:16 ` [PATCH v5] " Nguyễn Thái Ngọc Duy 2018-08-17 17:20 ` Junio C Hamano 2018-08-17 19:46 ` Torsten Bögershausen @ 2018-11-19 8:20 ` Carlo Marcelo Arenas Belón 2018-11-19 12:28 ` Torsten Bögershausen 2 siblings, 1 reply; 96+ messages in thread From: Carlo Marcelo Arenas Belón @ 2018-11-19 8:20 UTC (permalink / raw) To: git Cc: pclouds, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi While I don't have an HFS+ volume to test, I suspect this patch should be needed for both, even if I have to say thay even the broken output was better than the current state. Travis seems to be using a case sensitive filesystem so wouldn't catch this. Was windows/cygwin tested? Carlo -- >8 -- Subject: [PATCH] entry: fix t5061 on macOS b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems", 2018-08-17) was tested on Linux with an excemption for Windows that needs to be expanded for macOS (using APFS), which then would show : $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git warning: the following paths have collided (e.g. case-sensitive paths on a case-insensitive filesystem) and only one from the same colliding group is in the working tree: 'man2/_Exit.2' 'man2/_exit.2' 'man3/NAN.3' 'man3/nan.3' Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> --- entry.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/entry.c b/entry.c index 5d136c5d55..3845f570f7 100644 --- a/entry.c +++ b/entry.c @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state, { int i, trust_ino = check_stat; -#if defined(GIT_WINDOWS_NATIVE) +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__) trust_ino = 0; #endif -- 2.20.0.rc0 ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 8:20 ` Carlo Marcelo Arenas Belón @ 2018-11-19 12:28 ` Torsten Bögershausen 2018-11-19 17:14 ` Carlo Arenas 2018-11-19 17:21 ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones 0 siblings, 2 replies; 96+ messages in thread From: Torsten Bögershausen @ 2018-11-19 12:28 UTC (permalink / raw) To: Carlo Marcelo Arenas Belón, git Cc: pclouds, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote: > While I don't have an HFS+ volume to test, I suspect this patch should be > needed for both, even if I have to say thay even the broken output was > better than the current state. > > Travis seems to be using a case sensitive filesystem so wouldn't catch this. > > Was windows/cygwin tested? > > Carlo > -- >8 -- > Subject: [PATCH] entry: fix t5061 on macOS > > b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems", > 2018-08-17) was tested on Linux with an excemption for Windows that needs > to be expanded for macOS (using APFS), which then would show : > > $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git > warning: the following paths have collided (e.g. case-sensitive paths > on a case-insensitive filesystem) and only one from the same > colliding group is in the working tree: > > 'man2/_Exit.2' > 'man2/_exit.2' > 'man3/NAN.3' > 'man3/nan.3' > > Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> > --- > entry.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/entry.c b/entry.c > index 5d136c5d55..3845f570f7 100644 > --- a/entry.c > +++ b/entry.c > @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state, > { > int i, trust_ino = check_stat; > > -#if defined(GIT_WINDOWS_NATIVE) > +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__) > trust_ino = 0; > #endif > > Sorry, but I can't reproduce your problem here. Did you test it on Mac ? If I run t5601 on a case sensitive files system (Mac, mounted NFS, exported from Linux) I get: ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of !MINGW,!CYGWIN,CASE_INSENSITIVE_FS) And if I run it on a case-insensitive HFS+, I get ok 99 - colliding file detection So what exactly are you trying to fix ? ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 12:28 ` Torsten Bögershausen @ 2018-11-19 17:14 ` Carlo Arenas 2018-11-19 18:24 ` Duy Nguyen 2018-11-19 17:21 ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones 1 sibling, 1 reply; 96+ messages in thread From: Carlo Arenas @ 2018-11-19 17:14 UTC (permalink / raw) To: tboegi Cc: git, pclouds, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev On Mon, Nov 19, 2018 at 4:28 AM Torsten Bögershausen <tboegi@web.de> wrote: > > Did you test it on Mac ? macOS 10.14.1 but only using APFS, did you test my patch with HFS+? > So what exactly are you trying to fix ? I get not ok 99 - colliding file detection # # grep X icasefs/warning && # grep x icasefs/warning && # test_i18ngrep "the following paths have collided" icasefs/warning # and the output of "warning" only shows one of the conflicting files, instead of both: Cloning into 'bogus'... done. warning: the following paths have collided (e.g. case-sensitive paths on a case-insensitive filesystem) and only one from the same colliding group is in the working tree: 'x' Carlo ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 17:14 ` Carlo Arenas @ 2018-11-19 18:24 ` Duy Nguyen 2018-11-19 21:03 ` Duy Nguyen 0 siblings, 1 reply; 96+ messages in thread From: Duy Nguyen @ 2018-11-19 18:24 UTC (permalink / raw) To: Carlo Arenas Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler, Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor On Mon, Nov 19, 2018 at 6:14 PM Carlo Arenas <carenas@gmail.com> wrote: > > On Mon, Nov 19, 2018 at 4:28 AM Torsten Bögershausen <tboegi@web.de> wrote: > > > > Did you test it on Mac ? > > macOS 10.14.1 but only using APFS, did you test my patch with HFS+? > > > So what exactly are you trying to fix ? > > I get > > not ok 99 - colliding file detection > # > # grep X icasefs/warning && > # grep x icasefs/warning && > # test_i18ngrep "the following paths have collided" icasefs/warning > # > > and the output of "warning" only shows one of the conflicting files, > instead of both: > > Cloning into 'bogus'... > done. > warning: the following paths have collided (e.g. case-sensitive paths > on a case-insensitive filesystem) and only one from the same > colliding group is in the working tree: > > 'x' > > Carlo Could you send me the "index" file in t/trash\ directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of "stat /path/to/icase/bogus/x" My only explanation is somehow the inode value we save is not the same one on disk, which is weird and could even cause other problems. I'd like to know why this happens before trying to fix anything. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 18:24 ` Duy Nguyen @ 2018-11-19 21:03 ` Duy Nguyen 2018-11-19 21:04 ` Duy Nguyen ` (3 more replies) 0 siblings, 4 replies; 96+ messages in thread From: Duy Nguyen @ 2018-11-19 21:03 UTC (permalink / raw) To: Carlo Arenas Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler, Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor First of all, Ramsay, it would be great if you could test the below patch and see if it works on Cygwin. I assume since Cygwin shares the underlying filesystem, it will share the same "no trusting inode" issue with native builds (or it calculates inodes anyway using some other source?). Back to the APFS problem... On Mon, Nov 19, 2018 at 07:24:26PM +0100, Duy Nguyen wrote: > Could you send me the "index" file in t/trash\ > directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of > "stat /path/to/icase/bogus/x" > > My only explanation is somehow the inode value we save is not the same > one on disk, which is weird and could even cause other problems. I'd > like to know why this happens before trying to fix anything. Thanks Carlo for the file and "stat" output. The problem is APFS has 64-bit inode (according to the Internet) while we store inodes as 32-bit, so it's truncated. Which means this comparison sd_ino == st_ino is never true because sd_ino is truncated (0x2121063) while st_ino is not (0x202121063). Carlo, it would be great if you could test this patch also with APFS. It should fix problem. We will have to deal with the same truncated inode elsewhere to make sure we index refresh performance does not degrade on APFS. But that's a separate problem. Thank you for bringing this up. -- 8< -- diff --git a/entry.c b/entry.c index 5d136c5d55..809d3e2ba7 100644 --- a/entry.c +++ b/entry.c @@ -404,13 +404,13 @@ static void mark_colliding_entries(const struct checkout *state, { int i, trust_ino = check_stat; -#if defined(GIT_WINDOWS_NATIVE) +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__) trust_ino = 0; #endif ce->ce_flags |= CE_MATCHED; - for (i = 0; i < state->istate->cache_nr; i++) { + for (i = 0; i < trust_ino && state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; if (dup == ce) @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state, if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) continue; - if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || - (!trust_ino && !fspathcmp(ce->name, dup->name))) { + if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) { dup->ce_flags |= CE_MATCHED; + return; + } + } + + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if (!fspathcmp(ce->name, dup->name)) { + dup->ce_flags |= CE_MATCHED; + return; } } } diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index f1a49e94f5..c28d51bd59 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' ' ) ' -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' ' grep X icasefs/warning && grep x icasefs/warning && test_i18ngrep "the following paths have collided" icasefs/warning -- 8< -- ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 21:03 ` Duy Nguyen @ 2018-11-19 21:04 ` Duy Nguyen 2018-11-19 21:17 ` Duy Nguyen ` (2 subsequent siblings) 3 siblings, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-11-19 21:04 UTC (permalink / raw) To: Ramsay Jones Cc: Carlo Arenas, Torsten Bögershausen, Git Mailing List, Jeff Hostetler, Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor ... and I "dear Ramsay" without CCing him.. sigh.. sorry for the noise. On Mon, Nov 19, 2018 at 10:03 PM Duy Nguyen <pclouds@gmail.com> wrote: > > First of all, Ramsay, it would be great if you could test the below > patch and see if it works on Cygwin. I assume since Cygwin shares the > underlying filesystem, it will share the same "no trusting inode" > issue with native builds (or it calculates inodes anyway using some > other source?). > > Back to the APFS problem... > > On Mon, Nov 19, 2018 at 07:24:26PM +0100, Duy Nguyen wrote: > > Could you send me the "index" file in t/trash\ > > directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of > > "stat /path/to/icase/bogus/x" > > > > My only explanation is somehow the inode value we save is not the same > > one on disk, which is weird and could even cause other problems. I'd > > like to know why this happens before trying to fix anything. > > Thanks Carlo for the file and "stat" output. The problem is APFS has > 64-bit inode (according to the Internet) while we store inodes as > 32-bit, so it's truncated. Which means this comparison > > sd_ino == st_ino > > is never true because sd_ino is truncated (0x2121063) while st_ino is > not (0x202121063). > > Carlo, it would be great if you could test this patch also with > APFS. It should fix problem. We will have to deal with the same > truncated inode elsewhere to make sure we index refresh performance > does not degrade on APFS. But that's a separate problem. Thank you for > bringing this up. > > -- 8< -- > diff --git a/entry.c b/entry.c > index 5d136c5d55..809d3e2ba7 100644 > --- a/entry.c > +++ b/entry.c > @@ -404,13 +404,13 @@ static void mark_colliding_entries(const struct checkout *state, > { > int i, trust_ino = check_stat; > > -#if defined(GIT_WINDOWS_NATIVE) > +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__) > trust_ino = 0; > #endif > > ce->ce_flags |= CE_MATCHED; > > - for (i = 0; i < state->istate->cache_nr; i++) { > + for (i = 0; i < trust_ino && state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > > if (dup == ce) > @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state, > if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > continue; > > - if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || > - (!trust_ino && !fspathcmp(ce->name, dup->name))) { > + if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) { > dup->ce_flags |= CE_MATCHED; > + return; > + } > + } > + > + for (i = 0; i < state->istate->cache_nr; i++) { > + struct cache_entry *dup = state->istate->cache[i]; > + > + if (dup == ce) > break; > + > + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > + continue; > + > + if (!fspathcmp(ce->name, dup->name)) { > + dup->ce_flags |= CE_MATCHED; > + return; > } > } > } > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index f1a49e94f5..c28d51bd59 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' ' > ) > ' > > -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' > +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' ' > grep X icasefs/warning && > grep x icasefs/warning && > test_i18ngrep "the following paths have collided" icasefs/warning > -- 8< -- -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 21:03 ` Duy Nguyen 2018-11-19 21:04 ` Duy Nguyen @ 2018-11-19 21:17 ` Duy Nguyen 2018-11-19 23:29 ` Ramsay Jones 2018-11-20 2:22 ` Junio C Hamano 3 siblings, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-11-19 21:17 UTC (permalink / raw) To: Carlo Arenas Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler, Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor On Mon, Nov 19, 2018 at 10:03 PM Duy Nguyen <pclouds@gmail.com> wrote: > Thanks Carlo for the file and "stat" output. The problem is APFS has > 64-bit inode (according to the Internet) while we store inodes as > 32-bit, so it's truncated. > ... > We will have to deal with the same > truncated inode elsewhere to make sure we index refresh performance > does not degrade on APFS. ... and we don't have a problem there. Either Linus predicted dealing with 64-bit inodes, or he had a habit of casting st_ino to unsigned int, I cannot tell. This code ce->st_ino != (unsigned int)st->st_ino is from e83c516331 (Initial revision of "git", the information manager from hell - 2005-04-07) and it's still used today for comparing sd_ino with st->st_ino in read-cache.c. I guess I should have copied and pasted more often. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 21:03 ` Duy Nguyen 2018-11-19 21:04 ` Duy Nguyen 2018-11-19 21:17 ` Duy Nguyen @ 2018-11-19 23:29 ` Ramsay Jones 2018-11-19 23:54 ` Ramsay Jones 2018-11-20 2:22 ` Junio C Hamano 3 siblings, 1 reply; 96+ messages in thread From: Ramsay Jones @ 2018-11-19 23:29 UTC (permalink / raw) To: Duy Nguyen, Carlo Arenas Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler, Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor On 19/11/2018 21:03, Duy Nguyen wrote: > First of all, Ramsay, it would be great if you could test the below > patch and see if it works on Cygwin. I assume since Cygwin shares the > underlying filesystem, it will share the same "no trusting inode" > issue with native builds (or it calculates inodes anyway using some > other source?). Hmm, I have no idea why you would like me to try this patch - care to explain? [I just saw, "Has this been tested on cygwin?" and, since it has been happily passing for some time, responded yes!] Just for the giggles, I removed the !CYGWIN prerequisite from the test and when, as expected, the test failed, had a look around: $ pwd /home/ramsay/git/t/trash directory.t5601-clone $ cat icasefs/warning Cloning into 'bogus'... done. warning: the following paths have collided (e.g. case-sensitive paths on a case-insensitive filesystem) and only one from the same colliding group is in the working tree: 'x' $ cd icasefs/bogus $ ls -l total 0 -rw-r--r-- 1 ramsay None 0 Nov 19 22:40 x $ git ls-files --debug ignoring EOIE extension X ctime: 1542667201:664036600 mtime: 1542667201:663055400 dev: 2378432 ino: 324352 uid: 1001 gid: 513 size: 0 flags: 0 x ctime: 1542667201:665026800 mtime: 1542667201:665026800 dev: 2378432 ino: 324352 uid: 1001 gid: 513 size: 0 flags: 0 $ So, both X and x are in the index with the same inode number. Does that help? ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 23:29 ` Ramsay Jones @ 2018-11-19 23:54 ` Ramsay Jones 2018-11-20 1:05 ` Carlo Arenas 0 siblings, 1 reply; 96+ messages in thread From: Ramsay Jones @ 2018-11-19 23:54 UTC (permalink / raw) To: Duy Nguyen, Carlo Arenas Cc: Torsten Bögershausen, Git Mailing List, Jeff Hostetler, Junio C Hamano, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor On 19/11/2018 23:29, Ramsay Jones wrote: > > > On 19/11/2018 21:03, Duy Nguyen wrote: >> First of all, Ramsay, it would be great if you could test the below >> patch and see if it works on Cygwin. I assume since Cygwin shares the >> underlying filesystem, it will share the same "no trusting inode" >> issue with native builds (or it calculates inodes anyway using some >> other source?). > > Hmm, I have no idea why you would like me to try this patch - care > to explain? [I just saw, "Has this been tested on cygwin?" and, since > it has been happily passing for some time, responded yes!] > > Just for the giggles, I removed the !CYGWIN prerequisite from the > test and when, as expected, the test failed, had a look around: > > $ pwd > /home/ramsay/git/t/trash directory.t5601-clone > $ cat icasefs/warning > Cloning into 'bogus'... > done. > warning: the following paths have collided (e.g. case-sensitive paths > on a case-insensitive filesystem) and only one from the same > colliding group is in the working tree: > > 'x' > $ cd icasefs/bogus > $ ls -l > total 0 > -rw-r--r-- 1 ramsay None 0 Nov 19 22:40 x > $ git ls-files --debug > ignoring EOIE extension > X > ctime: 1542667201:664036600 > mtime: 1542667201:663055400 > dev: 2378432 ino: 324352 > uid: 1001 gid: 513 > size: 0 flags: 0 > x > ctime: 1542667201:665026800 > mtime: 1542667201:665026800 > dev: 2378432 ino: 324352 > uid: 1001 gid: 513 > size: 0 flags: 0 > $ > > So, both X and x are in the index with the same inode number. > > Does that help? Well, I haven't even looked at the patch, but when I apply it to the current 'pu' branch (just what I happened to have checked out) and run that one test: $ ./t5601-clone.sh ... ok 96 - shallow clone locally ok 97 - GIT_TRACE_PACKFILE produces a usable pack ok 98 - clone on case-insensitive fs ok 99 - colliding file detection ok 100 - partial clone ok 101 - partial clone: warn if server does not support object filtering ok 102 - batch missing blob request during checkout ok 103 - batch missing blob request does not inadvertently try to fetch gitlinks # passed all 103 test(s) # SKIP no web server found at '/usr/sbin/apache2' 1..103 $ ... the colliding file detection test passes! ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 23:54 ` Ramsay Jones @ 2018-11-20 1:05 ` Carlo Arenas 0 siblings, 0 replies; 96+ messages in thread From: Carlo Arenas @ 2018-11-20 1:05 UTC (permalink / raw) To: ramsay Cc: pclouds, tboegi, git, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev ok 99 - colliding file detection as well in macOS with APFS Carlo ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 21:03 ` Duy Nguyen ` (2 preceding siblings ...) 2018-11-19 23:29 ` Ramsay Jones @ 2018-11-20 2:22 ` Junio C Hamano 2018-11-20 16:28 ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy 3 siblings, 1 reply; 96+ messages in thread From: Junio C Hamano @ 2018-11-20 2:22 UTC (permalink / raw) To: Duy Nguyen Cc: Carlo Arenas, Torsten Bögershausen, Git Mailing List, Jeff Hostetler, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor Duy Nguyen <pclouds@gmail.com> writes: > > - for (i = 0; i < state->istate->cache_nr; i++) { > + for (i = 0; i < trust_ino && state->istate->cache_nr; i++) { There is some typo here, but modulo that this looks like the right thing to do. > @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state, > if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > continue; > > - if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || > - (!trust_ino && !fspathcmp(ce->name, dup->name))) { > + if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) { This is slightly unfortunate but is the best we can do for now. The reason why the design of the "cached stat info" mechanism allows the sd_* fields to be narrower than the underlying fields is because they are used only as an early-culling measure (if the value saved with truncation is different from the current value with truncation, then they cannot possibly be the same, so we know that the file changed without looking at the contents). This use however is different. Equality of truncated values immediately declare CE_MATCHED here, producing false negative, which is not what we want, no? > dup->ce_flags |= CE_MATCHED; > + return; > + } > + } ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH] clone: fix colliding file detection on APFS 2018-11-20 2:22 ` Junio C Hamano @ 2018-11-20 16:28 ` Nguyễn Thái Ngọc Duy 2018-11-20 19:20 ` Ramsay Jones ` (2 more replies) 0 siblings, 3 replies; 96+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2018-11-20 16:28 UTC (permalink / raw) To: gitster Cc: carenas, git, git, newren, pawelparuzel95, pclouds, peff, sandals, szeder.dev, tboegi, Ramsay Jones Commit b878579ae7 (clone: report duplicate entries on case-insensitive filesystems - 2018-08-17) adds a warning to user when cloning a repo with case-sensitive file names on a case-insensitive file system. The "find duplicate file" check was doing by comparing inode number (and only fall back to fspathcmp() when inode is known to be unreliable because fspathcmp() can't cover all case folding cases). The inode check is very simple, and wrong. It compares between a 32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When an inode is larger than 2^32 (which seems to be the case for APFS), it will be truncated and stored in sd_ino, but comparing with itself will fail. As a result, instead of showing a pair of files that have the same name, we show just one file (marked before the beginning of the loop). We fail to find the original one. The fix could be just a simple type cast (*) dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino but this is no longer a reliable test, there are 4G possible inodes that can match sd_ino because we only match the lower 32 bits instead of full 64 bits. There are two options to go. Either we ignore inode and go with fspathcmp() on Apple platform. This means we can't do accurate inode check on HFS anymore, or even on APFS when inode numbers are still below 2^32. Or we just to to reduce the odds of matching a wrong file by checking more attributes, counting mostly on st_size because st_xtime is likely the same. This patch goes with this direction, hoping that false positive chances are too small to be seen in practice. While at there, enable the test on Cygwin (verified working by Ramsay Jones) (*) this is also already done inside match_stat_data() Reported-by: Carlo Arenas <carenas@gmail.com> Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- So I'm going with match_stat_data(). But I don't know, perhaps just ignoring inode (like Carlo's original patch) is safer/better? Tested on case-insensitive JFS on Linux. But I don't think it really matters because I'm not even sure if I could push inode above 2^32 with this. Hacking JFS for this test sounds fun, but no time for that. entry.c | 4 ++-- t/t5601-clone.sh | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/entry.c b/entry.c index 5d136c5d55..0a3c451f5f 100644 --- a/entry.c +++ b/entry.c @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state, { int i, trust_ino = check_stat; -#if defined(GIT_WINDOWS_NATIVE) +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__) trust_ino = 0; #endif @@ -419,7 +419,7 @@ static void mark_colliding_entries(const struct checkout *state, if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) continue; - if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || + if ((trust_ino && !match_stat_data(&dup->ce_stat_data, st)) || (!trust_ino && !fspathcmp(ce->name, dup->name))) { dup->ce_flags |= CE_MATCHED; break; diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index f1a49e94f5..c28d51bd59 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' ' ) ' -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' ' grep X icasefs/warning && grep x icasefs/warning && test_i18ngrep "the following paths have collided" icasefs/warning -- 2.19.1.1327.g328c130451.dirty ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH] clone: fix colliding file detection on APFS 2018-11-20 16:28 ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy @ 2018-11-20 19:20 ` Ramsay Jones 2018-11-20 19:35 ` Carlo Arenas 2018-11-22 17:59 ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi 2 siblings, 0 replies; 96+ messages in thread From: Ramsay Jones @ 2018-11-20 19:20 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy, gitster Cc: carenas, git, git, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi On 20/11/2018 16:28, Nguyễn Thái Ngọc Duy wrote: > Commit b878579ae7 (clone: report duplicate entries on case-insensitive > filesystems - 2018-08-17) adds a warning to user when cloning a repo > with case-sensitive file names on a case-insensitive file system. The > "find duplicate file" check was doing by comparing inode number (and > only fall back to fspathcmp() when inode is known to be unreliable > because fspathcmp() can't cover all case folding cases). > > The inode check is very simple, and wrong. It compares between a > 32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When > an inode is larger than 2^32 (which seems to be the case for APFS), it > will be truncated and stored in sd_ino, but comparing with itself will > fail. > > As a result, instead of showing a pair of files that have the same > name, we show just one file (marked before the beginning of the > loop). We fail to find the original one. > > The fix could be just a simple type cast (*) > > dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino > > but this is no longer a reliable test, there are 4G possible inodes > that can match sd_ino because we only match the lower 32 bits instead > of full 64 bits. > > There are two options to go. Either we ignore inode and go with > fspathcmp() on Apple platform. This means we can't do accurate inode > check on HFS anymore, or even on APFS when inode numbers are still > below 2^32. > > Or we just to to reduce the odds of matching a wrong file by checking > more attributes, counting mostly on st_size because st_xtime is likely > the same. This patch goes with this direction, hoping that false > positive chances are too small to be seen in practice. > > While at there, enable the test on Cygwin (verified working by Ramsay > Jones) Well, no, I tested the previous version of this patch. However, this patch also passes the test. (Note _test_ singular - in order to check that this patch doesn't cause a regression I would need to run the whole test-suite - that takes 3.5 hours, if I'm not doing anything else!) > > (*) this is also already done inside match_stat_data() > > Reported-by: Carlo Arenas <carenas@gmail.com> > Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > So I'm going with match_stat_data(). But I don't know, perhaps just > ignoring inode (like Carlo's original patch) is safer/better? > > Tested on case-insensitive JFS on Linux. But I don't think it really > matters because I'm not even sure if I could push inode above 2^32 > with this. Hacking JFS for this test sounds fun, but no time for that. > > entry.c | 4 ++-- > t/t5601-clone.sh | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/entry.c b/entry.c > index 5d136c5d55..0a3c451f5f 100644 > --- a/entry.c > +++ b/entry.c > @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state, > { > int i, trust_ino = check_stat; > > -#if defined(GIT_WINDOWS_NATIVE) > +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__) I was a little curious about this (but couldn't be bothered actually read the code, post-application), so I removed this hunk from the patch, rebuilt and ran the test again: it _passed_ the test. :-D So, ... ATB, Ramsay Jones > trust_ino = 0; > #endif > > @@ -419,7 +419,7 @@ static void mark_colliding_entries(const struct checkout *state, > if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) > continue; > > - if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || > + if ((trust_ino && !match_stat_data(&dup->ce_stat_data, st)) || > (!trust_ino && !fspathcmp(ce->name, dup->name))) { > dup->ce_flags |= CE_MATCHED; > break; > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index f1a49e94f5..c28d51bd59 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' ' > ) > ' > > -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' > +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' ' > grep X icasefs/warning && > grep x icasefs/warning && > test_i18ngrep "the following paths have collided" icasefs/warning > ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH] clone: fix colliding file detection on APFS 2018-11-20 16:28 ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy 2018-11-20 19:20 ` Ramsay Jones @ 2018-11-20 19:35 ` Carlo Arenas 2018-11-20 19:38 ` Duy Nguyen 2018-11-22 17:59 ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi 2 siblings, 1 reply; 96+ messages in thread From: Carlo Arenas @ 2018-11-20 19:35 UTC (permalink / raw) To: pclouds Cc: gitster, git, git, newren, pawelparuzel95, peff, sandals, szeder.dev, tboegi, ramsay Tested-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> in macOS 10.14.1 with APFS in Linux using VFAT (for the lulz) IMHO it would be ideal if test would be enabled/validated for windows (native, not only cygwin) as it might even work without the override and if we are to see conflicts, that is probably where most users with file insensitive filesystems might be found Carlo ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH] clone: fix colliding file detection on APFS 2018-11-20 19:35 ` Carlo Arenas @ 2018-11-20 19:38 ` Duy Nguyen 0 siblings, 0 replies; 96+ messages in thread From: Duy Nguyen @ 2018-11-20 19:38 UTC (permalink / raw) To: Carlo Arenas Cc: Junio C Hamano, Jeff Hostetler, Git Mailing List, Elijah Newren, Paweł Paruzel, Jeff King, brian m. carlson, SZEDER Gábor, Torsten Bögershausen, Ramsay Jones On Tue, Nov 20, 2018 at 8:35 PM Carlo Arenas <carenas@gmail.com> wrote: > IMHO it would be ideal if test would be enabled/validated for windows > (native, not only cygwin) as it might even work without the override > and if we are to see conflicts, that is probably where most users with > file insensitive filesystems might be found Yes but I can't test on Windows so I will not enable the test until I got a report that it's working there. -- Duy ^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW 2018-11-20 16:28 ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy 2018-11-20 19:20 ` Ramsay Jones 2018-11-20 19:35 ` Carlo Arenas @ 2018-11-22 17:59 ` tboegi 2018-11-22 20:16 ` Carlo Marcelo Arenas Belón 2 siblings, 1 reply; 96+ messages in thread From: tboegi @ 2018-11-22 17:59 UTC (permalink / raw) To: git Cc: carenas, git, newren, pawelparuzel95, pclouds, peff, sandals, szeder.dev, tboegi, ramsay From: Torsten Bögershausen <tboegi@web.de> Commit b878579ae7 (clone: report duplicate entries on case-insensitive filesystems - 2018-08-17) adds a warning to user when cloning a repo with case-sensitive file names on a case-insensitive file system. This test has never been enabled for MINGW. It had been working since day 1, but I forget to report that to the author. Enable it after a re-test. Signed-off-by: Torsten Bögershausen <tboegi@web.de> --- The other day, I wanted to test Duys patch - under MINGW - to see if the problem is catch(ed) but hehe git am failed to apply - not a big desaster, because is is already in master Here is a follow-up, end we can end the match t/t5601-clone.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index c28d51bd59..8bbc7068ac 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' ' ) ' -test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' ' +test_expect_success CASE_INSENSITIVE_FS 'colliding file detection' ' grep X icasefs/warning && grep x icasefs/warning && test_i18ngrep "the following paths have collided" icasefs/warning -- 2.19.0.271.gfe8321ec05 ^ permalink raw reply related [flat|nested] 96+ messages in thread
* [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW 2018-11-22 17:59 ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi @ 2018-11-22 20:16 ` Carlo Marcelo Arenas Belón 2018-11-23 11:24 ` Johannes Schindelin 0 siblings, 1 reply; 96+ messages in thread From: Carlo Marcelo Arenas Belón @ 2018-11-22 20:16 UTC (permalink / raw) To: tboegi Cc: git, git, newren, pawelparuzel95, pclouds, peff, ramsay, sandals, szeder.dev Which FS was this tested on?, is Git LFS I keep hearing about also considered a "filesystem" for git? Could you also test with the following applied on top? Carlo -- >8 -- Subject: [PATCH] entry: remove windows fallback to inode checking this test is really FS specific, so is better to avoid any compiled assumptions about the platform and let the user drive the fallback through core.checkStat instead Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> --- entry.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/entry.c b/entry.c index 0a3c451f5f..5ae74856e6 100644 --- a/entry.c +++ b/entry.c @@ -404,10 +404,6 @@ static void mark_colliding_entries(const struct checkout *state, { int i, trust_ino = check_stat; -#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__) - trust_ino = 0; -#endif - ce->ce_flags |= CE_MATCHED; for (i = 0; i < state->istate->cache_nr; i++) { -- 2.20.0.rc1 ^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW 2018-11-22 20:16 ` Carlo Marcelo Arenas Belón @ 2018-11-23 11:24 ` Johannes Schindelin 0 siblings, 0 replies; 96+ messages in thread From: Johannes Schindelin @ 2018-11-23 11:24 UTC (permalink / raw) To: Carlo Marcelo Arenas Belón Cc: tboegi, git, git, newren, pawelparuzel95, pclouds, peff, ramsay, sandals, szeder.dev [-- Attachment #1: Type: text/plain, Size: 1623 bytes --] Hi Carlo, On Thu, 22 Nov 2018, Carlo Marcelo Arenas Belón wrote: > Subject: [PATCH] entry: remove windows fallback to inode checking > > this test is really FS specific, so is better to avoid any compiled > assumptions about the platform and let the user drive the fallback > through core.checkStat instead > > Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> > --- > entry.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/entry.c b/entry.c > index 0a3c451f5f..5ae74856e6 100644 > --- a/entry.c > +++ b/entry.c > @@ -404,10 +404,6 @@ static void mark_colliding_entries(const struct checkout *state, > { > int i, trust_ino = check_stat; > > -#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__) > - trust_ino = 0; > -#endif > - No, we cannot drop this. You may not see it in git.git's source code, but in Git for Windows' patches, we have an experimental feature (which had seemed to stabilize, but Ben Peart is currently doing wonders with it, improving the performance substantially) for accelerating the file metadata enumeration in a noticeable manner. The only way we can do that is by *not* insisting on a correct inode. Besides, IIRC even our regular stat() now "fails" to fill the inode field. So no, we cannot do that. We can probably drop the `|| defined(__CYGINW__)` part (Cygwin even generates a fake inode for FAT, where no equivalent is available, by hashing the full normalized path). But you cannot drop the `GIT_WINDOWS_NATIVE` part. Ciao, Johannes > ce->ce_flags |= CE_MATCHED; > > for (i = 0; i < state->istate->cache_nr; i++) { > -- > 2.20.0.rc1 > > ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 12:28 ` Torsten Bögershausen 2018-11-19 17:14 ` Carlo Arenas @ 2018-11-19 17:21 ` Ramsay Jones 2018-11-19 19:39 ` Carlo Arenas 1 sibling, 1 reply; 96+ messages in thread From: Ramsay Jones @ 2018-11-19 17:21 UTC (permalink / raw) To: Torsten Bögershausen, Carlo Marcelo Arenas Belón, git Cc: pclouds, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev On 19/11/2018 12:28, Torsten Bögershausen wrote: > On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote: >> While I don't have an HFS+ volume to test, I suspect this patch should be >> needed for both, even if I have to say thay even the broken output was >> better than the current state. >> >> Travis seems to be using a case sensitive filesystem so wouldn't catch this. >> >> Was windows/cygwin tested? >> >> Carlo >> -- >8 -- >> Subject: [PATCH] entry: fix t5061 on macOS >> >> b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems", >> 2018-08-17) was tested on Linux with an excemption for Windows that needs >> to be expanded for macOS (using APFS), which then would show : >> >> $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git >> warning: the following paths have collided (e.g. case-sensitive paths >> on a case-insensitive filesystem) and only one from the same >> colliding group is in the working tree: >> >> 'man2/_Exit.2' >> 'man2/_exit.2' >> 'man3/NAN.3' >> 'man3/nan.3' >> >> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> >> --- >> entry.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/entry.c b/entry.c >> index 5d136c5d55..3845f570f7 100644 >> --- a/entry.c >> +++ b/entry.c >> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state, >> { >> int i, trust_ino = check_stat; >> >> -#if defined(GIT_WINDOWS_NATIVE) >> +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__) >> trust_ino = 0; >> #endif >> >> > > Sorry, > but I can't reproduce your problem here. > > Did you test it on Mac ? > If I run t5601 on a case sensitive files system > (Mac, mounted NFS, exported from Linux) > I get: > ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of > !MINGW,!CYGWIN,CASE_INSENSITIVE_FS) I tested v2.20.0-rc0 on cygwin last night and it passed just fine. I just ran t5601-clone.sh on its own and got: $ ./t5601-clone.sh ... ok 98 - clone on case-insensitive fs ok 99 # skip colliding file detection (missing !CYGWIN of !MINGW,!CYGWIN,CASE_INSENSITIVE_FS) ok 100 - partial clone ok 101 - partial clone: warn if server does not support object filtering ok 102 - batch missing blob request during checkout ok 103 - batch missing blob request does not inadvertently try to fetch gitlinks # passed all 103 test(s) # SKIP no web server found at '/usr/sbin/apache2' 1..103 $ ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems 2018-11-19 17:21 ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones @ 2018-11-19 19:39 ` Carlo Arenas 0 siblings, 0 replies; 96+ messages in thread From: Carlo Arenas @ 2018-11-19 19:39 UTC (permalink / raw) To: ramsay Cc: tboegi, git, pclouds, git, gitster, newren, pawelparuzel95, peff, sandals, szeder.dev On Mon, Nov 19, 2018 at 9:23 AM Ramsay Jones <ramsay@ramsayjones.plus.com> wrote: > ok 99 # skip colliding file detection (missing !CYGWIN of !MINGW,!CYGWIN,CASE_INSENSITIVE_FS) you need to enable this specific test first (removing !CYGWIN) so it doesn't get skipped Carlo ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: Git clone and case sensitivity 2018-07-29 9:28 ` Jeff King 2018-07-30 15:27 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy @ 2018-07-31 19:39 ` Jeff Hostetler 1 sibling, 0 replies; 96+ messages in thread From: Jeff Hostetler @ 2018-07-31 19:39 UTC (permalink / raw) To: Jeff King, Duy Nguyen Cc: brian m. carlson, Paweł Paruzel, Git Mailing List On 7/29/2018 5:28 AM, Jeff King wrote: > On Sun, Jul 29, 2018 at 07:26:41AM +0200, Duy Nguyen wrote: > >>> strcasecmp() will only catch a subset of the cases. We really need to >>> follow the same folding rules that the filesystem would. >> >> True. But that's how we handle case insensitivity internally. If a >> filesytem has more sophisticated folding rules then git will not work >> well on that one anyway. > > Hrm. Yeah, I guess that's the best we can do for the actual in-memory > checks. Everything else depends on doing an actual filesystem operation, > and our icase stuff kicks in way before then. I was mostly thinking of > HFS+ utf8 normalization weirdness, but I guess people are accustomed to > that by now. > >>> For the case of clone, I actually wonder if we could detect during the >>> checkout step that a file already exists. Since we know that the >>> directory we started with was empty, then if it does, either: >>> >>> - there's some funny case-folding going on that means two paths in the >>> repository map to the same name in the filesystem; or >>> >>> - somebody else is writing to the directory at the same time as us >> >> This is exactly what my first patch does (minus the sparse checkout >> part). > > Right, sorry, I should have read that one more carefully. > >> But without knowing the exact folding rules, I don't think we can >> locate this "somebody else" who wrote the first path. So if N paths >> are treated the same by this filesystem, we could only report N-1 of >> them. >> >> If we want to report just one path when this happens though, then this >> works quite well. > > Hmm. Since most such systems are case-preserving, would it be possible > to report the name of the existing file? Doing it via opendir/readdir is > hacky, and anyway puts the burden on us to find the matching name. Doing > it via fstat() on the opened file doesn't work because at that the > filesystem has resolved the name to an inode. > > So yeah, perhaps strcasecmp() is the best we can do (I do agree that > being able to mention all of the conflicting names is a benefit). > > I guess we should be using fspathcmp(), though, in case it later learns > to be smarter. > > -Peff > As has already been mentioned, this gets into weird territory really fast, between case folding, final space/dot on windows, utf8 NFC/NFD weirdness on the mac, utf8 invisible chars on the mac, long/short names on windows, and etc. And that's just for filenames. Things really get weird if directory names have these ambiguities. Perhaps just print the problematic paths (where the collision is detected) and let the user decide how to correct them. Perhaps we could have a separate tool that could scan the index or commit for potential conflicts and warn them in advance (granted, it might not be perfect and may report a few false positives). Forcing them into a sparse-checkout situation might be over their skill level. Jeff ^ permalink raw reply [flat|nested] 96+ messages in thread
end of thread, other threads:[~2018-11-23 11:24 UTC | newest] Thread overview: 96+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-27 9:59 Git clone and case sensitivity Paweł Paruzel 2018-07-27 20:59 ` brian m. carlson 2018-07-28 4:36 ` Duy Nguyen 2018-07-28 4:45 ` Duy Nguyen 2018-07-28 4:48 ` Jeff King 2018-07-28 5:11 ` Duy Nguyen 2018-07-28 9:48 ` Simon Ruderich 2018-07-28 9:56 ` Jeff King 2018-07-28 18:05 ` brian m. carlson 2018-07-29 5:26 ` Duy Nguyen 2018-07-29 9:28 ` Jeff King 2018-07-30 15:27 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-07-31 18:23 ` Torsten Bögershausen 2018-08-01 15:25 ` Duy Nguyen 2018-07-31 18:44 ` Elijah Newren 2018-07-31 19:12 ` Junio C Hamano 2018-07-31 19:29 ` Jeff King 2018-07-31 20:12 ` Junio C Hamano 2018-07-31 20:37 ` Jeff King 2018-07-31 20:57 ` Junio C Hamano 2018-08-01 21:20 ` Junio C Hamano 2018-08-02 14:43 ` Duy Nguyen 2018-08-02 16:27 ` Junio C Hamano 2018-08-02 19:06 ` Jeff King 2018-08-02 21:14 ` Junio C Hamano 2018-08-02 21:28 ` Jeff King 2018-08-03 18:23 ` Jeff Hostetler 2018-08-03 18:49 ` Junio C Hamano 2018-08-03 18:53 ` Jeff King 2018-08-05 14:01 ` Jeff Hostetler 2018-08-03 14:28 ` Torsten Bögershausen 2018-08-01 15:21 ` Duy Nguyen 2018-07-31 19:13 ` Junio C Hamano 2018-08-01 15:16 ` Duy Nguyen 2018-08-07 19:01 ` [PATCH v2] " Nguyễn Thái Ngọc Duy 2018-08-07 19:31 ` Junio C Hamano 2018-08-08 19:48 ` Jeff Hostetler 2018-08-08 22:31 ` Jeff King 2018-08-09 0:41 ` Junio C Hamano 2018-08-09 14:23 ` Jeff King 2018-08-09 21:14 ` Jeff Hostetler 2018-08-09 21:34 ` Jeff King 2018-08-09 21:40 ` Elijah Newren 2018-08-09 21:44 ` Jeff King 2018-08-09 21:53 ` Elijah Newren 2018-08-09 21:59 ` Jeff King 2018-08-09 23:05 ` Elijah Newren 2018-08-09 22:07 ` Junio C Hamano 2018-08-10 15:36 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy 2018-08-10 15:36 ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-08-10 16:42 ` Junio C Hamano 2018-08-11 10:09 ` SZEDER Gábor 2018-08-11 13:16 ` Duy Nguyen 2018-08-13 16:55 ` Junio C Hamano 2018-08-13 17:12 ` Duy Nguyen 2018-08-10 16:12 ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Junio C Hamano 2018-08-12 9:07 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy 2018-08-13 15:32 ` Jeff Hostetler 2018-08-13 17:18 ` Junio C Hamano 2018-08-15 19:08 ` Torsten Bögershausen 2018-08-15 19:35 ` Duy Nguyen 2018-08-16 15:56 ` [PATCH] config.txt: clarify core.checkStat = minimal Nguyễn Thái Ngọc Duy 2018-08-16 17:01 ` Junio C Hamano 2018-08-16 18:19 ` Duy Nguyen 2018-08-16 22:29 ` Junio C Hamano 2018-08-17 15:26 ` Junio C Hamano 2018-08-17 15:29 ` Duy Nguyen 2018-08-15 19:38 ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano 2018-08-16 14:03 ` Torsten Bögershausen 2018-08-16 15:42 ` Duy Nguyen 2018-08-16 16:23 ` Junio C Hamano 2018-08-17 16:16 ` [PATCH v5] " Nguyễn Thái Ngọc Duy 2018-08-17 17:20 ` Junio C Hamano 2018-08-17 18:00 ` Duy Nguyen 2018-08-17 19:46 ` Torsten Bögershausen 2018-11-19 8:20 ` Carlo Marcelo Arenas Belón 2018-11-19 12:28 ` Torsten Bögershausen 2018-11-19 17:14 ` Carlo Arenas 2018-11-19 18:24 ` Duy Nguyen 2018-11-19 21:03 ` Duy Nguyen 2018-11-19 21:04 ` Duy Nguyen 2018-11-19 21:17 ` Duy Nguyen 2018-11-19 23:29 ` Ramsay Jones 2018-11-19 23:54 ` Ramsay Jones 2018-11-20 1:05 ` Carlo Arenas 2018-11-20 2:22 ` Junio C Hamano 2018-11-20 16:28 ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy 2018-11-20 19:20 ` Ramsay Jones 2018-11-20 19:35 ` Carlo Arenas 2018-11-20 19:38 ` Duy Nguyen 2018-11-22 17:59 ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi 2018-11-22 20:16 ` Carlo Marcelo Arenas Belón 2018-11-23 11:24 ` Johannes Schindelin 2018-11-19 17:21 ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones 2018-11-19 19:39 ` Carlo Arenas 2018-07-31 19:39 ` Git clone and case sensitivity Jeff Hostetler
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).