git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	git@jeffhostetler.com, git@vger.kernel.org, newren@gmail.com,
	pawelparuzel95@gmail.com, peff@peff.net,
	sandals@crustytoothpaste.net,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
Date: Thu, 16 Aug 2018 16:03:12 +0200	[thread overview]
Message-ID: <20180816140312.GA6102@tor.lan> (raw)
In-Reply-To: <xmqqtvnvh12u.fsf@gitster-ct.c.googlers.com>

On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote:

This should answer Duys comments as well.
> Torsten Bögershausen <tboegi@web.de> writes:
> 
[snip]
> > Should the following be protected by core.checkstat ? 
> > 	if (check_stat) {
> 
> I do not think such a if statement is strictly necessary.
> 
> Even if check_stat tells us "when checking if a cached stat
> information tells us that the path may have modified, use minimum
> set of fields from the 'struct stat'", we still capture and update
> the values from the same "full" set of fields when we mark a cache
> entry up-to-date.  So it all depends on why you are limiting with
> check_stat.  Is it because stdev is unusable?  Is it because nsec is
> unusable?  Is it because ino is unusable?  Only in the last case,
> paying attention to check_stat will reduce the false positive.
> 
> But then you made me wonder what value check_stat has on Windows.
> If it is false, perhaps we do not even need the conditional
> compilation, which is a huge plus.

Agreed:
check_stat is 0 on Windows, and inum is allways 0 in lstat().
I was thinking about systems which don't have inodes and inum,
and then generate an inum in memory, sometimes random.
After a reboot or a re-mount of the file systems those ino values
change.
However, for the initial clone we are fine in any case.

> 
> >> +		if (dup->ce_stat_data.sd_ino == st->st_ino) {
> >> +			dup->ce_flags |= CE_MATCHED;
> >> +			break;
> >> +		}
> >> +	}
> >> +#endif
> >
> > Another thing is that we switch of the ASCII case-folding-detection-logic
> > off for Windows users, even if we otherwise rely on icase.
> > I think we can use fspathcmp() as a fallback. when inodes fail,
> > because we may be on a network file system.
> >
> > (I don't have a test setup at the moment, but what happens with inodes
> > when a Windows machine exports a share to Linux or Mac ?)
> >
> > Is there a chance to get the fspathcmp() back, like this ?
> 
> If fspathcmp() never gives false positives, I do not think we would
> mind using it like your update.  False negatives are fine, as that
> is better than just punting the whole thing when there is no usable
> inum.  And we do not care all that much if it is more expensive;
> this is an error codepath after all.
> 
> And from code structure's point of view, I think it makes sense.  It
> would be even better if we can lose the conditional compilation.

The current implementation of fspathcmp() does not give false positvies,
and future versions should not either.
All case-insentive file systems have always treated 'a-z' equal to 'A-Z'.
In FAT MS/DOS there had only been uppercase letters as file names,
and `type file.txt` (the equivilant to ´cat file.txt´ in *nix)
simply resultet in `type FILE.TXT`
Later, with VFAT and later with HPFS/NTFS a file could be stored on
disk as "File.txt".
From now on  ´type FILE.TXT´ still worked, (and all other upper-lowercase
combinations).
This all is probably nothing new.
The main point should be that fspathcmp() should never return a false positive,
and I think we all agree on that. 


Now back to the compiler switch:
Windows always set inum to 0 and I can't think about a situation where
a file in a working tree gets inum = 0, can we use the following:

static void mark_colliding_entries(const struct checkout *state,
				   struct cache_entry *ce, struct stat *st)
{
	int i;
	ce->ce_flags |= CE_MATCHED;

	for (i = 0; i < state->istate->cache_nr; i++) {
		struct cache_entry *dup = state->istate->cache[i];
		int folded = 0;

		if (dup == ce)
			break;

		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
			continue;
		/*
		 * Windows sets ino to 0. On other FS ino = 0 will already be
		 *  used, so we don't see it for a file in a Git working tree
		 */
		if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
			folded = 1;

		/*
		 * Fallback for NTFS and other case insenstive FS,
		 * which don't use POSIX inums
		 */
		if (!fspathcmp(dup->name, ce->name))
			folded = 1;

		if (folded) {
			dup->ce_flags |= CE_MATCHED;
			break;
		}
	}
}


> 
> Another thing we maybe want to see is if we can update the caller of
> this function so that we do not overwrite the earlier checkout with
> the data for this path.  When two paths collide, we check out one of
> the paths without reporting (because we cannot notice), then attempt
> to check out the other path and report (because we do notice the
> previous one with lstat()).  The current code then goes on and overwrites
> the file with the contents from the "other" path.
> 
> Even if we had false negative in this loop, if we leave the contents
> for the earlier path while reporting the "other" path, then the user
> can get curious, inspect what contents the "other" path has on the
> filesystem, and can notice that it belongs to the (unreported--due
> to false negative) earlier path.
> 
[snip]

  reply	other threads:[~2018-08-16 14:03 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-27  9:59 Git clone and case sensitivity Paweł Paruzel
2018-07-27 20:59 ` brian m. carlson
2018-07-28  4:36   ` Duy Nguyen
2018-07-28  4:45     ` Duy Nguyen
2018-07-28  4:48       ` Jeff King
2018-07-28  5:11         ` Duy Nguyen
2018-07-28  9:48           ` Simon Ruderich
2018-07-28  9:56           ` Jeff King
2018-07-28 18:05             ` brian m. carlson
2018-07-29  5:26             ` Duy Nguyen
2018-07-29  9:28               ` Jeff King
2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-07-31 18:23                   ` Torsten Bögershausen
2018-08-01 15:25                     ` Duy Nguyen
2018-07-31 18:44                   ` Elijah Newren
2018-07-31 19:12                     ` Junio C Hamano
2018-07-31 19:29                       ` Jeff King
2018-07-31 20:12                         ` Junio C Hamano
2018-07-31 20:37                           ` Jeff King
2018-07-31 20:57                             ` Junio C Hamano
2018-08-01 21:20                               ` Junio C Hamano
2018-08-02 14:43                                 ` Duy Nguyen
2018-08-02 16:27                                   ` Junio C Hamano
2018-08-02 19:06                                     ` Jeff King
2018-08-02 21:14                                       ` Junio C Hamano
2018-08-02 21:28                                         ` Jeff King
2018-08-03 18:23                                           ` Jeff Hostetler
2018-08-03 18:49                                             ` Junio C Hamano
2018-08-03 18:53                                             ` Jeff King
2018-08-05 14:01                                               ` Jeff Hostetler
2018-08-03 14:28                                   ` Torsten Bögershausen
2018-08-01 15:21                     ` Duy Nguyen
2018-07-31 19:13                   ` Junio C Hamano
2018-08-01 15:16                     ` Duy Nguyen
2018-08-07 19:01                   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-08-07 19:31                     ` Junio C Hamano
2018-08-08 19:48                       ` Jeff Hostetler
2018-08-08 22:31                         ` Jeff King
2018-08-09  0:41                           ` Junio C Hamano
2018-08-09 14:23                             ` Jeff King
2018-08-09 21:14                               ` Jeff Hostetler
2018-08-09 21:34                                 ` Jeff King
2018-08-09 21:40                                 ` Elijah Newren
2018-08-09 21:44                                   ` Jeff King
2018-08-09 21:53                                     ` Elijah Newren
2018-08-09 21:59                                       ` Jeff King
2018-08-09 23:05                                         ` Elijah Newren
2018-08-09 22:07                                   ` Junio C Hamano
2018-08-10 15:36                     ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy
2018-08-10 15:36                       ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-08-10 16:42                         ` Junio C Hamano
2018-08-11 10:09                         ` SZEDER Gábor
2018-08-11 13:16                           ` Duy Nguyen
2018-08-13 16:55                             ` Junio C Hamano
2018-08-13 17:12                               ` Duy Nguyen
2018-08-10 16:12                       ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Junio C Hamano
2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-08-13 15:32                         ` Jeff Hostetler
2018-08-13 17:18                         ` Junio C Hamano
2018-08-15 19:08                         ` Torsten Bögershausen
2018-08-15 19:35                           ` Duy Nguyen
2018-08-16 15:56                             ` [PATCH] config.txt: clarify core.checkStat = minimal Nguyễn Thái Ngọc Duy
2018-08-16 17:01                               ` Junio C Hamano
2018-08-16 18:19                                 ` Duy Nguyen
2018-08-16 22:29                                   ` Junio C Hamano
2018-08-17 15:26                                   ` Junio C Hamano
2018-08-17 15:29                                     ` Duy Nguyen
2018-08-15 19:38                           ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano
2018-08-16 14:03                             ` Torsten Bögershausen [this message]
2018-08-16 15:42                               ` Duy Nguyen
2018-08-16 16:23                               ` Junio C Hamano
2018-08-17 16:16                         ` [PATCH v5] " Nguyễn Thái Ngọc Duy
2018-08-17 17:20                           ` Junio C Hamano
2018-08-17 18:00                             ` Duy Nguyen
2018-08-17 19:46                           ` Torsten Bögershausen
2018-11-19  8:20                           ` Carlo Marcelo Arenas Belón
2018-11-19 12:28                             ` Torsten Bögershausen
2018-11-19 17:14                               ` Carlo Arenas
2018-11-19 18:24                                 ` Duy Nguyen
2018-11-19 21:03                                   ` Duy Nguyen
2018-11-19 21:04                                     ` Duy Nguyen
2018-11-19 21:17                                     ` Duy Nguyen
2018-11-19 23:29                                     ` Ramsay Jones
2018-11-19 23:54                                       ` Ramsay Jones
2018-11-20  1:05                                         ` Carlo Arenas
2018-11-20  2:22                                     ` Junio C Hamano
2018-11-20 16:28                                       ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy
2018-11-20 19:20                                         ` Ramsay Jones
2018-11-20 19:35                                         ` Carlo Arenas
2018-11-20 19:38                                           ` Duy Nguyen
2018-11-22 17:59                                         ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi
2018-11-22 20:16                                           ` Carlo Marcelo Arenas Belón
2018-11-23 11:24                                             ` Johannes Schindelin
2018-11-19 17:21                               ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones
2018-11-19 19:39                                 ` Carlo Arenas
2018-07-31 19:39                 ` Git clone and case sensitivity Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180816140312.GA6102@tor.lan \
    --to=tboegi@web.de \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=pawelparuzel95@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).