git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: "Duy Nguyen" <pclouds@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Paweł Paruzel" <pawelparuzel95@gmail.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
Date: Fri, 3 Aug 2018 14:23:17 -0400	[thread overview]
Message-ID: <5b17454b-7fa7-7a9c-92d9-214e6e697785@jeffhostetler.com> (raw)
In-Reply-To: <20180802212819.GA32538@sigill.intra.peff.net>



On 8/2/2018 5:28 PM, Jeff King wrote:
> On Thu, Aug 02, 2018 at 02:14:30PM -0700, Junio C Hamano wrote:
> 
>> Jeff King <peff@peff.net> writes:
>>
>>> I also wonder if Windows could return some other file-unique identifier
>>> that would work in place of an inode here. That would be pretty easy to
>>> swap in via an #ifdef's helper function. I'd be OK shipping without that
>>> and letting Windows folks fill it in later (as long as we do not do
>>> anything too stupid until then, like claim all of the inode==0 files are
>>> the same).
>>
>> Yeah, but such a useful file-unique identifier would probably be
>> used in place of inum in their (l)stat emulation already, if exists,
>> no?
> 
> Maybe. It might not work as ino_t. Or it might be expensive to get.  Or
> maybe it's simply impossible. I don't know much about Windows. Some
> searching implies that NTFS does have a "file index" concept which is
> supposed to be unique.

This is hard and/or expensive on Windows.  Yes, you can get the
"file index" values for an open file handle with a cost similar to
an fstat().  Unfortunately, the FindFirst/FindNext routines (equivalent
to the opendir/readdir routines), don't give you that data.  So we'd
have to scan the directory and then open and stat each file.  This is
terribly expensive on Windows -- and the reason we have the fscache
layer (in the GfW version) to intercept the lstat() calls whenever
possible.

It might be possible to use the NTFS Master File Table to discover
this (very big handwave), but I would need to do a little digging.

This would all be NTFS specific.  FAT and other volume types would not
be covered.

Another thing to keep in mind is that the collision could be because
of case folding (or other such nonsense) on a directory in the path.
I mean, if someone on Linux builds a commit containing:

     a/b/c/D/e/foo.txt
     a/b/c/d/e/foo.txt

we'll get a similar collision as if one of them were spelled "FOO.txt".

Also, do we need to worry about hard-links or symlinks here?
If checkout populates symlinks, then you might have another collision
opportunity.  For example:

     a/b/c/D/e/foo.txt
     a/link -> ./b/c/d
     a/link/e/foo.txt

Also, some platforms (like the Mac) allow directory hard-links.
Granted, Git doesn't create hard-links during checkout, but the
user might.

I'm sure there are other edge cases here that make reporting
difficult; these are just a few I thought of.  I guess what I'm
trying to say is that as a first step just report that you found
a collision -- without trying to identify the set existing objects
that it collided with.

> 
> At any rate, until we have an actual plan for Windows, I think it would
> make sense only to split the cases into "has working inodes" and
> "other", and make sure "other" does something sensible in the meantime
> (like mention the conflict, but skip trying to list duplicates).

Yes, this should be split.  Do the "easy" Linux version first.
Keep in mind that there may also be a different solution for the Mac.

> When somebody wants to work on Windows support, then we can figure out
> if it just needs to wrap the "get unique identifier" operation, or if it
> would use a totally different algorithm.
> 
> -Peff
> 

Jeff

  reply	other threads:[~2018-08-03 18:23 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-27  9:59 Git clone and case sensitivity Paweł Paruzel
2018-07-27 20:59 ` brian m. carlson
2018-07-28  4:36   ` Duy Nguyen
2018-07-28  4:45     ` Duy Nguyen
2018-07-28  4:48       ` Jeff King
2018-07-28  5:11         ` Duy Nguyen
2018-07-28  9:48           ` Simon Ruderich
2018-07-28  9:56           ` Jeff King
2018-07-28 18:05             ` brian m. carlson
2018-07-29  5:26             ` Duy Nguyen
2018-07-29  9:28               ` Jeff King
2018-07-30 15:27                 ` [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-07-31 18:23                   ` Torsten Bögershausen
2018-08-01 15:25                     ` Duy Nguyen
2018-07-31 18:44                   ` Elijah Newren
2018-07-31 19:12                     ` Junio C Hamano
2018-07-31 19:29                       ` Jeff King
2018-07-31 20:12                         ` Junio C Hamano
2018-07-31 20:37                           ` Jeff King
2018-07-31 20:57                             ` Junio C Hamano
2018-08-01 21:20                               ` Junio C Hamano
2018-08-02 14:43                                 ` Duy Nguyen
2018-08-02 16:27                                   ` Junio C Hamano
2018-08-02 19:06                                     ` Jeff King
2018-08-02 21:14                                       ` Junio C Hamano
2018-08-02 21:28                                         ` Jeff King
2018-08-03 18:23                                           ` Jeff Hostetler [this message]
2018-08-03 18:49                                             ` Junio C Hamano
2018-08-03 18:53                                             ` Jeff King
2018-08-05 14:01                                               ` Jeff Hostetler
2018-08-03 14:28                                   ` Torsten Bögershausen
2018-08-01 15:21                     ` Duy Nguyen
2018-07-31 19:13                   ` Junio C Hamano
2018-08-01 15:16                     ` Duy Nguyen
2018-08-07 19:01                   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-08-07 19:31                     ` Junio C Hamano
2018-08-08 19:48                       ` Jeff Hostetler
2018-08-08 22:31                         ` Jeff King
2018-08-09  0:41                           ` Junio C Hamano
2018-08-09 14:23                             ` Jeff King
2018-08-09 21:14                               ` Jeff Hostetler
2018-08-09 21:34                                 ` Jeff King
2018-08-09 21:40                                 ` Elijah Newren
2018-08-09 21:44                                   ` Jeff King
2018-08-09 21:53                                     ` Elijah Newren
2018-08-09 21:59                                       ` Jeff King
2018-08-09 23:05                                         ` Elijah Newren
2018-08-09 22:07                                   ` Junio C Hamano
2018-08-10 15:36                     ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Nguyễn Thái Ngọc Duy
2018-08-10 15:36                       ` [PATCH v3 1/1] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-08-10 16:42                         ` Junio C Hamano
2018-08-11 10:09                         ` SZEDER Gábor
2018-08-11 13:16                           ` Duy Nguyen
2018-08-13 16:55                             ` Junio C Hamano
2018-08-13 17:12                               ` Duy Nguyen
2018-08-10 16:12                       ` [PATCH v3 0/1] clone: warn on colidding entries on checkout Junio C Hamano
2018-08-12  9:07                       ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Nguyễn Thái Ngọc Duy
2018-08-13 15:32                         ` Jeff Hostetler
2018-08-13 17:18                         ` Junio C Hamano
2018-08-15 19:08                         ` Torsten Bögershausen
2018-08-15 19:35                           ` Duy Nguyen
2018-08-16 15:56                             ` [PATCH] config.txt: clarify core.checkStat = minimal Nguyễn Thái Ngọc Duy
2018-08-16 17:01                               ` Junio C Hamano
2018-08-16 18:19                                 ` Duy Nguyen
2018-08-16 22:29                                   ` Junio C Hamano
2018-08-17 15:26                                   ` Junio C Hamano
2018-08-17 15:29                                     ` Duy Nguyen
2018-08-15 19:38                           ` [PATCH v4] clone: report duplicate entries on case-insensitive filesystems Junio C Hamano
2018-08-16 14:03                             ` Torsten Bögershausen
2018-08-16 15:42                               ` Duy Nguyen
2018-08-16 16:23                               ` Junio C Hamano
2018-08-17 16:16                         ` [PATCH v5] " Nguyễn Thái Ngọc Duy
2018-08-17 17:20                           ` Junio C Hamano
2018-08-17 18:00                             ` Duy Nguyen
2018-08-17 19:46                           ` Torsten Bögershausen
2018-11-19  8:20                           ` Carlo Marcelo Arenas Belón
2018-11-19 12:28                             ` Torsten Bögershausen
2018-11-19 17:14                               ` Carlo Arenas
2018-11-19 18:24                                 ` Duy Nguyen
2018-11-19 21:03                                   ` Duy Nguyen
2018-11-19 21:04                                     ` Duy Nguyen
2018-11-19 21:17                                     ` Duy Nguyen
2018-11-19 23:29                                     ` Ramsay Jones
2018-11-19 23:54                                       ` Ramsay Jones
2018-11-20  1:05                                         ` Carlo Arenas
2018-11-20  2:22                                     ` Junio C Hamano
2018-11-20 16:28                                       ` [PATCH] clone: fix colliding file detection on APFS Nguyễn Thái Ngọc Duy
2018-11-20 19:20                                         ` Ramsay Jones
2018-11-20 19:35                                         ` Carlo Arenas
2018-11-20 19:38                                           ` Duy Nguyen
2018-11-22 17:59                                         ` [PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW tboegi
2018-11-22 20:16                                           ` Carlo Marcelo Arenas Belón
2018-11-23 11:24                                             ` Johannes Schindelin
2018-11-19 17:21                               ` [PATCH v5] clone: report duplicate entries on case-insensitive filesystems Ramsay Jones
2018-11-19 19:39                                 ` Carlo Arenas
2018-07-31 19:39                 ` Git clone and case sensitivity Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b17454b-7fa7-7a9c-92d9-214e6e697785@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=pawelparuzel95@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).