git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git on Windows maps creation time onto changed time
@ 2018-02-01 23:10 Keith Goldfarb
  2018-02-01 23:22 ` Ævar Arnfjörð Bjarmason
  2018-02-02 17:38 ` Johannes Sixt
  0 siblings, 2 replies; 7+ messages in thread
From: Keith Goldfarb @ 2018-02-01 23:10 UTC (permalink / raw)
  To: git

Dear git,

While tracking down a problem with a filesystem shared by Windows and Ubuntu, I came across the following code in compat/mingw.c (ming_fstat(), also in do_lstat()):

	if (GetFileInformationByHandle(fh, &fdata)) {
		buf->st_ino = 0;
		buf->st_gid = 0;
		buf->st_uid = 0;
		buf->st_nlink = 1;
		buf->st_mode = file_attr_to_st_mode(fdata.dwFileAttributes);
		buf->st_size = fdata.nFileSizeLow |
			(((off_t)fdata.nFileSizeHigh)<<32);
		buf->st_dev = buf->st_rdev = 0; /* not used by Git */
		buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
		buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
		buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
		return 0;
	}

The assignment of buf->st_ctime doesn’t seem right to me. I understand there’s no good choice here, but I think a better choice would be to duplicate the definition used for st_mtime.

Background: When I do a git status on Windows and then later on Ubuntu (or the other order), it is extremely slow, as the entire tree is being traversed. I tracked it down to this difference in definition of c_time. Yes, I know about the core.trustctime variable, but my problem aside this seems like an unwise choice.

Thanks for listening,

K.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git on Windows maps creation time onto changed time
  2018-02-01 23:10 Git on Windows maps creation time onto changed time Keith Goldfarb
@ 2018-02-01 23:22 ` Ævar Arnfjörð Bjarmason
  2018-02-02 17:38 ` Johannes Sixt
  1 sibling, 0 replies; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-01 23:22 UTC (permalink / raw)
  To: Keith Goldfarb; +Cc: Git Mailing List, Marius Storm-Olsen

On Fri, Feb 2, 2018 at 12:10 AM, Keith Goldfarb
<keith@blackthorn-media.com> wrote:
> Dear git,
>
> While tracking down a problem with a filesystem shared by Windows and Ubuntu, I came across the following code in compat/mingw.c (ming_fstat(), also in do_lstat()):
>
>         if (GetFileInformationByHandle(fh, &fdata)) {
>                 buf->st_ino = 0;
>                 buf->st_gid = 0;
>                 buf->st_uid = 0;
>                 buf->st_nlink = 1;
>                 buf->st_mode = file_attr_to_st_mode(fdata.dwFileAttributes);
>                 buf->st_size = fdata.nFileSizeLow |
>                         (((off_t)fdata.nFileSizeHigh)<<32);
>                 buf->st_dev = buf->st_rdev = 0; /* not used by Git */
>                 buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
>                 buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
>                 buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
>                 return 0;
>         }
>
> The assignment of buf->st_ctime doesn’t seem right to me. I understand there’s no good choice here, but I think a better choice would be to duplicate the definition used for st_mtime.
>
> Background: When I do a git status on Windows and then later on Ubuntu (or the other order), it is extremely slow, as the entire tree is being traversed. I tracked it down to this difference in definition of c_time. Yes, I know about the core.trustctime variable, but my problem aside this seems like an unwise choice.
>
> Thanks for listening,

Let's CC Marius Storm-Olsen who added this behavior (CC'd).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git on Windows maps creation time onto changed time
  2018-02-01 23:10 Git on Windows maps creation time onto changed time Keith Goldfarb
  2018-02-01 23:22 ` Ævar Arnfjörð Bjarmason
@ 2018-02-02 17:38 ` Johannes Sixt
  2018-02-02 18:45   ` Keith Goldfarb
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Sixt @ 2018-02-02 17:38 UTC (permalink / raw)
  To: Keith Goldfarb; +Cc: git

Am 02.02.2018 um 00:10 schrieb Keith Goldfarb:
> Dear git,
> 
> While tracking down a problem with a filesystem shared by Windows and Ubuntu, I came across the following code in compat/mingw.c (ming_fstat(), also in do_lstat()):
> 
> 	if (GetFileInformationByHandle(fh, &fdata)) {
> 		buf->st_ino = 0;
> 		buf->st_gid = 0;
> 		buf->st_uid = 0;
> 		buf->st_nlink = 1;
> 		buf->st_mode = file_attr_to_st_mode(fdata.dwFileAttributes);
> 		buf->st_size = fdata.nFileSizeLow |
> 			(((off_t)fdata.nFileSizeHigh)<<32);
> 		buf->st_dev = buf->st_rdev = 0; /* not used by Git */
> 		buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
> 		buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
> 		buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
> 		return 0;
> 	}
> 
> The assignment of buf->st_ctime doesn’t seem right to me. I
> understand there’s no good choice here, but I think a better choice
> would be to  duplicate the definition used for st_mtime.

The purpose of these values is to allow to notice a change on the file 
system without going through the actual file data. Duplicating st_mtime 
would be pointless.

> Background: When I do a git status on Windows and then later on
> Ubuntu (or the other order), it is extremely slow, as the entire tree
> is being traversed. I tracked it down to this difference in
> definition of c_time. Yes, I know about the core.trustctime variable,
> but my problem aside  this seems like an unwise choice.

Don't do that then. Use core.trustctime.

-- Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git on Windows maps creation time onto changed time
  2018-02-02 17:38 ` Johannes Sixt
@ 2018-02-02 18:45   ` Keith Goldfarb
  2018-02-02 21:18     ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Goldfarb @ 2018-02-02 18:45 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

> The purpose of these values is to allow to notice a change on the file system without going through the actual file data. Duplicating st_mtime would be pointless.

Well, I agree with the first statement. I disagree with the 2nd. It’s not pointless to fix a problem, and in theory the creation date of a file can never change.

> Don't do that then. Use core.trustctime.

I am. Unfortunately, my problem isn’t solved by that alone. Perhaps this deserves its own thread, but on Windows the st_ino field is set to zero. This can also trigger a false positive, causing the whole cache to be rebuilt unnecessarily.

K.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git on Windows maps creation time onto changed time
  2018-02-02 18:45   ` Keith Goldfarb
@ 2018-02-02 21:18     ` Johannes Schindelin
  2018-02-02 22:44       ` Keith Goldfarb
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2018-02-02 21:18 UTC (permalink / raw)
  To: Keith Goldfarb; +Cc: Johannes Sixt, git

[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]

Hi Keith,

On Fri, 2 Feb 2018, Keith Goldfarb wrote:

> > The purpose of these values is to allow to notice a change on the file
> > system without going through the actual file data. Duplicating
> > st_mtime would be pointless.
> 
> Well, I agree with the first statement. I disagree with the 2nd. It’s
> not pointless to fix a problem, and in theory the creation date of a
> file can never change.
> 
> > Don't do that then. Use core.trustctime.
> 
> I am. Unfortunately, my problem isn’t solved by that alone. Perhaps this
> deserves its own thread, but on Windows the st_ino field is set to zero.
> This can also trigger a false positive, causing the whole cache to be
> rebuilt unnecessarily.

You cannot assume that the inode numbers are identical between file
systems/operating systems. That's simply not going to work.

I know you really want *so hard* for the same working directory to be
accessible from both Windows and Linux. I have a lot of sympathy for that
sentiment. Though I do not see much chance for success on that front.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git on Windows maps creation time onto changed time
  2018-02-02 21:18     ` Johannes Schindelin
@ 2018-02-02 22:44       ` Keith Goldfarb
  2018-02-02 23:01         ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Goldfarb @ 2018-02-02 22:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, git

> On Feb 2, 2018, at 1:18 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> You cannot assume that the inode numbers are identical between file
> systems/operating systems. That's simply not going to work.

Yes, I agree with you, I cannot assume this, so I checked. In my case, the inode numbers are indeed identical.

> I know you really want *so hard* for the same working directory to be
> accessible from both Windows and Linux. I have a lot of sympathy for that
> sentiment. Though I do not see much chance for success on that front.

I’m certainly willing to accept that there are going to be limitations by using a filesystem from two different operating systems. But regardless of the problems caused by that pattern, would you agree that the Windows code should be using the actual inode number (well, 32-bits of it) instead of zero?

K.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git on Windows maps creation time onto changed time
  2018-02-02 22:44       ` Keith Goldfarb
@ 2018-02-02 23:01         ` Johannes Schindelin
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2018-02-02 23:01 UTC (permalink / raw)
  To: Keith Goldfarb; +Cc: Johannes Sixt, git

[-- Attachment #1: Type: text/plain, Size: 2699 bytes --]

Hi Keith,

On Fri, 2 Feb 2018, Keith Goldfarb wrote:

> > On Feb 2, 2018, at 1:18 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > 
> > You cannot assume that the inode numbers are identical between file
> > systems/operating systems. That's simply not going to work.
> 
> Yes, I agree with you, I cannot assume this, so I checked. In my case, the inode numbers are indeed identical.
> 
> > I know you really want *so hard* for the same working directory to be
> > accessible from both Windows and Linux. I have a lot of sympathy for that
> > sentiment. Though I do not see much chance for success on that front.
> 
> I’m certainly willing to accept that there are going to be limitations by using a filesystem from two different operating systems. But regardless of the problems caused by that pattern, would you agree that the Windows code should be using the actual inode number (well, 32-bits of it) instead of zero?

In all likelihood, you have enabled the FS Cache feature. And for a good
reason: it makes many Git operations much, much faster.

Still not as fast as on Linux because Git is quite tuned to assume all the
Linux semantics. So instead of using tell-tales for file modifications
that are portable, Git uses tell-tales that are readily available on
Linux. And instead of using access patterns that are portable, Git assumes
POSIX semantics where you simply use opendir()/readdir()/closedir() and in
that inner loop, you call lstat() to get all the stat data.

This is obviously not portable beyond POSIX. On Windows, for example,
there are really fast FindFirstFile()/FindNextFile() APIs that even allow
you to specify patterns to match (which Linux has to do manually, by
inspecting the file name obtained via readdir()). You even get some stat
data back at the same time, really fast, without additional API calls,
unlike on Linux.

But Git's access patterns really are tuned to the *dir() way, and we have
to emulate it in Git for Windows. This is very, very slow.

The FS Cache feature tries to gain back some speed by using
FindFirstFile()/FindNextFile() pre-emptively and caching the data, looking
it up upon the next time stat data is queried.

The stat data obtained that way does not, however, include Change Time nor
inode number.

And that, dear children, was the story for the day: this is why Git for
Windows uses the CreationTime instead of the Change Time and why it marks
inode as 0; Because to do it accurately would be a lot slower, and we
cannot afford even slower because we are already super slow because we
have to work around Git's assumptions as to what APIs are available.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-02 23:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-01 23:10 Git on Windows maps creation time onto changed time Keith Goldfarb
2018-02-01 23:22 ` Ævar Arnfjörð Bjarmason
2018-02-02 17:38 ` Johannes Sixt
2018-02-02 18:45   ` Keith Goldfarb
2018-02-02 21:18     ` Johannes Schindelin
2018-02-02 22:44       ` Keith Goldfarb
2018-02-02 23:01         ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).