git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] t6042: fix breakage on Windows
@ 2019-01-16 13:37 Johannes Schindelin via GitGitGadget
  2019-01-16 13:37 ` [PATCH 1/1] t6042: work around speed optimization " Johannes Schindelin via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-01-16 13:37 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Junio C Hamano

Unfortunately, Travis decided to change some things under the hood that
affect the Windows build. As a consequence, master has not been tested in
quite a while, even if the test runs pretended that it had been tested :-(

So imagine my surprise when master simply would refuse to pass the test
suite cleanly outside Travis, always failing at t6042, despite the fact that
Travis passed.

It turns out that two files are written too quickly in succession, running
into the issue where Git for Windows chooses not to populate the inode and
device numbers in the stat data (this is a noticeable performance
optimization). As a consequence, Git thinks the file is unchanged, and fails
to pick up a modification. And no, we cannot simply undo the performance
optimization, it would make things prohibitively slow in particular in large
worktrees, and it is not like the bug is likely to be hit in reality: for
Git to be fooled into thinking that a file is unchanged, it has to be
written with the same file size, and within a 100ns time bucket (it is
pretty improbable that there is any real-world scenario that would run into 
that, except of course our regression test suite).

This patch works around this issue by forcing Git to recognize the new file
versions as new files (which they really are: the patch simply replaces

git mv <old> <new> && mv <file> <new> && git add <new>`

by

git rm <old> && mv <file> <new> && git add <new>`

which is not shorter, but even a performance improvement (an unnoticeable
one, of course).

Johannes Schindelin (1):
  t6042: work around speed optimization on Windows

 t/t6042-merge-rename-corner-cases.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


base-commit: 77556354bb7ac50450e3b28999e3576969869068
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-109%2Fdscho%2Ffix-t6042-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-109/dscho/fix-t6042-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/109
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] t6042: work around speed optimization on Windows
  2019-01-16 13:37 [PATCH 0/1] t6042: fix breakage on Windows Johannes Schindelin via GitGitGadget
@ 2019-01-16 13:37 ` Johannes Schindelin via GitGitGadget
  2019-01-16 15:56   ` Elijah Newren
  2019-01-16 15:32 ` [PATCH 0/1] t6042: fix breakage " Elijah Newren
  2019-01-17  8:29 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
  2 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-01-16 13:37 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When Git determines whether a file has changed, it looks at the mtime,
at the file size, and to detect changes even if the mtime is the same
(on Windows, the mtime granularity is 100ns, read: if two files are
written within the same 100ns time slot, they have the same mtime) and
even if the file size is the same, Git also looks at the inode/device
numbers.

This design obviously comes from a Linux background, where `lstat()`
calls were designed to be cheap.

On Windows, there is no `lstat()`. It has to be emulated. And while
obtaining the mtime and the file size is not all that expensive (you can
get both with a single `GetFileAttributesW()` call), obtaining the
equivalent of the inode and device numbers is very expensive (it
requires a call to `GetFileInformationByHandle()`, which in turn
requires a file handle, which is *a lot* more expensive than one might
imagine).

As it is very uncommon for developers to modify files within 100ns time
slots, Git for Windows chooses not to fill inode/device numbers
properly, but simply sets them to 0.

However, in t6042 the files file_v1 and file_v2 are typically written
within the same 100ns time slot, and they do not differ in file size. So
the minor modification is not picked up.

Let's work around this issue by avoiding the `git mv` calls in the
'mod6-setup: chains of rename/rename(1to2) and rename/rename(2to1)' test
case. The target files are overwritten anyway, so it is not like we
really rename those files. This fixes the issue because `git add` will
now add the files as new files (as opposed to existing, just renamed
files).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t6042-merge-rename-corner-cases.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh
index 7cc34e7579..09dfa8bd92 100755
--- a/t/t6042-merge-rename-corner-cases.sh
+++ b/t/t6042-merge-rename-corner-cases.sh
@@ -1175,7 +1175,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
 
 		# Handle the left side
 		git checkout L &&
-		git mv one three &&
+		git rm one two &&
 		mv -f file_v2 three &&
 		mv -f file_v5 two &&
 		git add two three &&
@@ -1183,7 +1183,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
 
 		# Handle the right side
 		git checkout R &&
-		git mv two three &&
+		git rm one two &&
 		mv -f file_v3 one &&
 		mv -f file_v6 three &&
 		git add one three &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/1] t6042: fix breakage on Windows
  2019-01-16 13:37 [PATCH 0/1] t6042: fix breakage on Windows Johannes Schindelin via GitGitGadget
  2019-01-16 13:37 ` [PATCH 1/1] t6042: work around speed optimization " Johannes Schindelin via GitGitGadget
@ 2019-01-16 15:32 ` Elijah Newren
  2019-01-16 20:41   ` Johannes Schindelin
  2019-01-17  8:29 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
  2 siblings, 1 reply; 11+ messages in thread
From: Elijah Newren @ 2019-01-16 15:32 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: Git Mailing List, Junio C Hamano

Hi Dscho,

On Wed, Jan 16, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Unfortunately, Travis decided to change some things under the hood that
> affect the Windows build. As a consequence, master has not been tested in
> quite a while, even if the test runs pretended that it had been tested :-(
>
> So imagine my surprise when master simply would refuse to pass the test
> suite cleanly outside Travis, always failing at t6042, despite the fact that
> Travis passed.
>
> It turns out that two files are written too quickly in succession, running
> into the issue where Git for Windows chooses not to populate the inode and
> device numbers in the stat data (this is a noticeable performance
> optimization). As a consequence, Git thinks the file is unchanged, and fails
> to pick up a modification. And no, we cannot simply undo the performance
> optimization, it would make things prohibitively slow in particular in large
> worktrees, and it is not like the bug is likely to be hit in reality: for
> Git to be fooled into thinking that a file is unchanged, it has to be
> written with the same file size, and within a 100ns time bucket (it is
> pretty improbable that there is any real-world scenario that would run into
> that, except of course our regression test suite).
>
> This patch works around this issue by forcing Git to recognize the new file
> versions as new files (which they really are: the patch simply replaces
>
> git mv <old> <new> && mv <file> <new> && git add <new>`
>
> by
>
> git rm <old> && mv <file> <new> && git add <new>`
>
> which is not shorter, but even a performance improvement (an unnoticeable
> one, of course).

Everything sounds good up to this final sentence.  I'm wondering if
I'm misunderstanding or if there were some simple editing errors; in
particular, I'm not sure what "even" means here (should it be left
out?), and it seems like you meant "noticeable" rather than
"unnoticeable" -- is that correct?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] t6042: work around speed optimization on Windows
  2019-01-16 13:37 ` [PATCH 1/1] t6042: work around speed optimization " Johannes Schindelin via GitGitGadget
@ 2019-01-16 15:56   ` Elijah Newren
  2019-01-16 20:30     ` Johannes Schindelin
  0 siblings, 1 reply; 11+ messages in thread
From: Elijah Newren @ 2019-01-16 15:56 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin

Hi Dscho,

On Wed, Jan 16, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> When Git determines whether a file has changed, it looks at the mtime,
> at the file size, and to detect changes even if the mtime is the same
> (on Windows, the mtime granularity is 100ns, read: if two files are
> written within the same 100ns time slot, they have the same mtime) and
> even if the file size is the same, Git also looks at the inode/device
> numbers.
>
> This design obviously comes from a Linux background, where `lstat()`
> calls were designed to be cheap.
>
> On Windows, there is no `lstat()`. It has to be emulated. And while
> obtaining the mtime and the file size is not all that expensive (you can
> get both with a single `GetFileAttributesW()` call), obtaining the
> equivalent of the inode and device numbers is very expensive (it
> requires a call to `GetFileInformationByHandle()`, which in turn
> requires a file handle, which is *a lot* more expensive than one might
> imagine).
>
> As it is very uncommon for developers to modify files within 100ns time
> slots, Git for Windows chooses not to fill inode/device numbers
> properly, but simply sets them to 0.
>
> However, in t6042 the files file_v1 and file_v2 are typically written
> within the same 100ns time slot, and they do not differ in file size. So
> the minor modification is not picked up.
>
> Let's work around this issue by avoiding the `git mv` calls in the
> 'mod6-setup: chains of rename/rename(1to2) and rename/rename(2to1)' test
> case. The target files are overwritten anyway, so it is not like we
> really rename those files. This fixes the issue because `git add` will
> now add the files as new files (as opposed to existing, just renamed
> files).

I actually read this before the cover letter (just responded in
opposite order), and the last paragraph alarmed me at first, because
it made it sound like it was dispensing with the need for rename
detection and thus breaking the intent of the testcase.  Granted,
looking at the code it becomes clear you're not changing the intent of
the testcase at all, just transforming the setup steps slightly.  In
your cover letter, you made this clear by stating that you were
replacing
    git mv <old> <new> && mv <file> <new> && git add <new>`
by
    git rm <old> && mv <file> <new> && git add <new>`
Perhaps something like that could be added here or other wording to
clarify that it's just an innocuous transformation of the setup steps?

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/t6042-merge-rename-corner-cases.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh
> index 7cc34e7579..09dfa8bd92 100755
> --- a/t/t6042-merge-rename-corner-cases.sh
> +++ b/t/t6042-merge-rename-corner-cases.sh
> @@ -1175,7 +1175,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
>
>                 # Handle the left side
>                 git checkout L &&
> -               git mv one three &&
> +               git rm one two &&

Here you not only remove the file being renamed ('one') but also a
file being modified ('two'); you didn't mention the latter in your
commit message.  Since both are added back later it'll still work
either way on linux, but was it suffering from the same problem on
Windows?  It too was a case of both the filesize before and after
remaining the same despite contents changing, so it certainly seem
possible.

>                 mv -f file_v2 three &&
>                 mv -f file_v5 two &&
>                 git add two three &&
> @@ -1183,7 +1183,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
>
>                 # Handle the right side
>                 git checkout R &&
> -               git mv two three &&
> +               git rm one two &&

Also here you remove both a renamed and a modified file -- though in
this case I think the file sizes change for both (each increases by
one character) so this hunk of the patch probably isn't needed.  It
doesn't hurt though, and could be considered future-proofing against
possible changes to the setup files, and may also just be nice to make
the two blocks of setup look more consistent

>                 mv -f file_v3 one &&
>                 mv -f file_v6 three &&
>                 git add one three &&
> --
> gitgitgadget


Thanks,
Elijah

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] t6042: work around speed optimization on Windows
  2019-01-16 15:56   ` Elijah Newren
@ 2019-01-16 20:30     ` Johannes Schindelin
  2019-01-16 21:13       ` Elijah Newren
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin @ 2019-01-16 20:30 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin via GitGitGadget, Git Mailing List,
	Junio C Hamano

Hi Elijah,

On Wed, 16 Jan 2019, Elijah Newren wrote:

> On Wed, Jan 16, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > When Git determines whether a file has changed, it looks at the mtime,
> > at the file size, and to detect changes even if the mtime is the same
> > (on Windows, the mtime granularity is 100ns, read: if two files are
> > written within the same 100ns time slot, they have the same mtime) and
> > even if the file size is the same, Git also looks at the inode/device
> > numbers.
> >
> > This design obviously comes from a Linux background, where `lstat()`
> > calls were designed to be cheap.
> >
> > On Windows, there is no `lstat()`. It has to be emulated. And while
> > obtaining the mtime and the file size is not all that expensive (you
> > can get both with a single `GetFileAttributesW()` call), obtaining the
> > equivalent of the inode and device numbers is very expensive (it
> > requires a call to `GetFileInformationByHandle()`, which in turn
> > requires a file handle, which is *a lot* more expensive than one might
> > imagine).
> >
> > As it is very uncommon for developers to modify files within 100ns
> > time slots, Git for Windows chooses not to fill inode/device numbers
> > properly, but simply sets them to 0.
> >
> > However, in t6042 the files file_v1 and file_v2 are typically written
> > within the same 100ns time slot, and they do not differ in file size.
> > So the minor modification is not picked up.
> >
> > Let's work around this issue by avoiding the `git mv` calls in the
> > 'mod6-setup: chains of rename/rename(1to2) and rename/rename(2to1)'
> > test case. The target files are overwritten anyway, so it is not like
> > we really rename those files. This fixes the issue because `git add`
> > will now add the files as new files (as opposed to existing, just
> > renamed files).
> 
> I actually read this before the cover letter (just responded in opposite
> order), and the last paragraph alarmed me at first, because it made it
> sound like it was dispensing with the need for rename detection and thus
> breaking the intent of the testcase.

Right, the cover letter has the benefit of being written last (so I have
thought about the issue the longest before casting my thoughts into
words while writing it, longer than when I come up with the commit
message, which is typically even before I test things thoroughly).

> Granted, looking at the code it becomes clear you're not changing the
> intent of the testcase at all, just transforming the setup steps
> slightly.  In your cover letter, you made this clear by stating that you
> were replacing
>     git mv <old> <new> && mv <file> <new> && git add <new>`
> by
>     git rm <old> && mv <file> <new> && git add <new>`
> Perhaps something like that could be added here or other wording to
> clarify that it's just an innocuous transformation of the setup steps?

Sure. I added this paragraph:

	Functionally, we do not change anything because we replace two
	`git mv <old> <new>` calls (where `<new>` is completely
	overwritten and `git add`ed later anyway) by `git rm <old>` calls
	(removing other files, too, that are also completely overwritten
	and `git add`ed later).

This should explain the intention nicely, would you agree?

> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  t/t6042-merge-rename-corner-cases.sh | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh
> > index 7cc34e7579..09dfa8bd92 100755
> > --- a/t/t6042-merge-rename-corner-cases.sh
> > +++ b/t/t6042-merge-rename-corner-cases.sh
> > @@ -1175,7 +1175,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
> >
> >                 # Handle the left side
> >                 git checkout L &&
> > -               git mv one three &&
> > +               git rm one two &&
> 
> Here you not only remove the file being renamed ('one') but also a
> file being modified ('two'); you didn't mention the latter in your
> commit message.

True. But it *is* replaced and `git add`ed later, too.

> Since both are added back later it'll still work either way on linux,
> but was it suffering from the same problem on Windows?

Yes. All 6 versions of the file are written in quick succession. So yes,
*any* of these 6 versions that are intended to replace another of said 6
versions will suffer from the exact same problem.

> It too was a case of both the filesize before and after remaining the
> same despite contents changing, so it certainly seem possible.

Yep.

> >                 mv -f file_v2 three &&
> >                 mv -f file_v5 two &&
> >                 git add two three &&
> > @@ -1183,7 +1183,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
> >
> >                 # Handle the right side
> >                 git checkout R &&
> > -               git mv two three &&
> > +               git rm one two &&
> 
> Also here you remove both a renamed and a modified file -- though in
> this case I think the file sizes change for both (each increases by
> one character) so this hunk of the patch probably isn't needed.

It did not make sense for me to use two *different* ways to perform
essentially the same operation. I know I would scratch my head until it
hurts if I read intentionally different command cadences, to figure out
why they are different. Only to find out that they are not intended to be
different? I don't think so.

> It doesn't hurt though, and could be considered future-proofing against
> possible changes to the setup files, and may also just be nice to make
> the two blocks of setup look more consistent

Indeed. I would *hate* to lose time over reading those test cases in 6
months, just because they use different paradigms (even if they want to do
the same thing).

And I have to spend *way* too much time reading test scripts. So I am very
adamant about making my own life easier here.

Point in case: it took me quite a good while to understand the code that I
fixed in this patch, because the underlying intention was not immediately
accessible to me. The test failure happens in the *next* test case, after
all. That was a head scratcher, too: the problem was in a test case that
*passed*, not in the one that failed.

It all comes back to what I have been harping on for at least a year: our
test suite is not really optimized for figuring out how to fix test
failures. There is nothing immediately actionable when anything fails. A
developer has to spend considerable time to find out what is going wrong
if a test failure happens (is it flakey? Is it a bug in the *test*? Is it
a *true regression*?)

Of course, this is not a new problem. And I *do* see that some
contributors are as interested as I am in a more useful test suite, one
where a failed test *really* indicates a regression, one where the output
of a failed test has enough evidence to start fixing the bug right away
(rather than spending time on understanding the intention of what was
tested first). For example, when `wc -l` invocations were replaced by
`test_line_count`, not only did we address portability problems (where BSD
`wc` would indent the output), we also changed the code to indicate quite
clearly *what* was tested. We do not yet output the contents of the file
when `test_i18ngrep` fails, but that would also be a step in the right
direction.

Here's hoping that our test suite improves in that respect: to make it
easier and quicker to fix regressions.

Ciao,
Dscho

> >                 mv -f file_v3 one &&
> >                 mv -f file_v6 three &&
> >                 git add one three &&
> > --
> > gitgitgadget
> 
> 
> Thanks,
> Elijah
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/1] t6042: fix breakage on Windows
  2019-01-16 15:32 ` [PATCH 0/1] t6042: fix breakage " Elijah Newren
@ 2019-01-16 20:41   ` Johannes Schindelin
  0 siblings, 0 replies; 11+ messages in thread
From: Johannes Schindelin @ 2019-01-16 20:41 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin via GitGitGadget, Git Mailing List,
	Junio C Hamano

Hi Elijah,

On Wed, 16 Jan 2019, Elijah Newren wrote:

> Hi Dscho,
> 
> On Wed, Jan 16, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > Unfortunately, Travis decided to change some things under the hood that
> > affect the Windows build. As a consequence, master has not been tested in
> > quite a while, even if the test runs pretended that it had been tested :-(
> >
> > So imagine my surprise when master simply would refuse to pass the test
> > suite cleanly outside Travis, always failing at t6042, despite the fact that
> > Travis passed.
> >
> > It turns out that two files are written too quickly in succession, running
> > into the issue where Git for Windows chooses not to populate the inode and
> > device numbers in the stat data (this is a noticeable performance
> > optimization). As a consequence, Git thinks the file is unchanged, and fails
> > to pick up a modification. And no, we cannot simply undo the performance
> > optimization, it would make things prohibitively slow in particular in large
> > worktrees, and it is not like the bug is likely to be hit in reality: for
> > Git to be fooled into thinking that a file is unchanged, it has to be
> > written with the same file size, and within a 100ns time bucket (it is
> > pretty improbable that there is any real-world scenario that would run into
> > that, except of course our regression test suite).
> >
> > This patch works around this issue by forcing Git to recognize the new file
> > versions as new files (which they really are: the patch simply replaces
> >
> > git mv <old> <new> && mv <file> <new> && git add <new>`
> >
> > by
> >
> > git rm <old> && mv <file> <new> && git add <new>`
> >
> > which is not shorter, but even a performance improvement (an unnoticeable
> > one, of course).
> 
> Everything sounds good up to this final sentence.  I'm wondering if
> I'm misunderstanding or if there were some simple editing errors; in
> particular, I'm not sure what "even" means here (should it be left
> out?), and it seems like you meant "noticeable" rather than
> "unnoticeable" -- is that correct?

All I meant is that the test script now has less to do: rather than rename
a file and overwrite it, it is simply deleted and written anew. In the
filesystem layer, I would expect this to be a tiny fraction of a
millisecond faster. Not enough to matter, but conceptually it is simpler.

Since Git does not record renames (quite intentionally so, as can be
verified by reading the rather unruly arguments back in the day,
disregarding even valid arguments such as: if we *do* know better than
Git, why not provide a way to tell Git about renames?), there is no need
to use `git mv` in this instance, either.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] t6042: work around speed optimization on Windows
  2019-01-16 20:30     ` Johannes Schindelin
@ 2019-01-16 21:13       ` Elijah Newren
  2019-01-17  9:35         ` Johannes Schindelin
  0 siblings, 1 reply; 11+ messages in thread
From: Elijah Newren @ 2019-01-16 21:13 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, Git Mailing List,
	Junio C Hamano

Hi Dscho,

On Wed, Jan 16, 2019 at 12:31 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> On Wed, 16 Jan 2019, Elijah Newren wrote:
>
> > On Wed, Jan 16, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > >
> > > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > >
> > > When Git determines whether a file has changed, it looks at the mtime,
> > > at the file size, and to detect changes even if the mtime is the same
> > > (on Windows, the mtime granularity is 100ns, read: if two files are
> > > written within the same 100ns time slot, they have the same mtime) and
> > > even if the file size is the same, Git also looks at the inode/device
> > > numbers.
> > >
> > > This design obviously comes from a Linux background, where `lstat()`
> > > calls were designed to be cheap.
> > >
> > > On Windows, there is no `lstat()`. It has to be emulated. And while
> > > obtaining the mtime and the file size is not all that expensive (you
> > > can get both with a single `GetFileAttributesW()` call), obtaining the
> > > equivalent of the inode and device numbers is very expensive (it
> > > requires a call to `GetFileInformationByHandle()`, which in turn
> > > requires a file handle, which is *a lot* more expensive than one might
> > > imagine).
> > >
> > > As it is very uncommon for developers to modify files within 100ns
> > > time slots, Git for Windows chooses not to fill inode/device numbers
> > > properly, but simply sets them to 0.
> > >
> > > However, in t6042 the files file_v1 and file_v2 are typically written
> > > within the same 100ns time slot, and they do not differ in file size.
> > > So the minor modification is not picked up.
> > >
> > > Let's work around this issue by avoiding the `git mv` calls in the
> > > 'mod6-setup: chains of rename/rename(1to2) and rename/rename(2to1)'
> > > test case. The target files are overwritten anyway, so it is not like
> > > we really rename those files. This fixes the issue because `git add`
> > > will now add the files as new files (as opposed to existing, just
> > > renamed files).
> >
> > I actually read this before the cover letter (just responded in opposite
> > order), and the last paragraph alarmed me at first, because it made it
> > sound like it was dispensing with the need for rename detection and thus
> > breaking the intent of the testcase.
>
> Right, the cover letter has the benefit of being written last (so I have
> thought about the issue the longest before casting my thoughts into
> words while writing it, longer than when I come up with the commit
> message, which is typically even before I test things thoroughly).
>
> > Granted, looking at the code it becomes clear you're not changing the
> > intent of the testcase at all, just transforming the setup steps
> > slightly.  In your cover letter, you made this clear by stating that you
> > were replacing
> >     git mv <old> <new> && mv <file> <new> && git add <new>`
> > by
> >     git rm <old> && mv <file> <new> && git add <new>`
> > Perhaps something like that could be added here or other wording to
> > clarify that it's just an innocuous transformation of the setup steps?
>
> Sure. I added this paragraph:
>
>         Functionally, we do not change anything because we replace two
>         `git mv <old> <new>` calls (where `<new>` is completely
>         overwritten and `git add`ed later anyway) by `git rm <old>` calls
>         (removing other files, too, that are also completely overwritten
>         and `git add`ed later).
>
> This should explain the intention nicely, would you agree?

Yep, thanks.  :-)

> > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > > ---
> > >  t/t6042-merge-rename-corner-cases.sh | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh
> > > index 7cc34e7579..09dfa8bd92 100755
> > > --- a/t/t6042-merge-rename-corner-cases.sh
> > > +++ b/t/t6042-merge-rename-corner-cases.sh
> > > @@ -1175,7 +1175,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
> > >
> > >                 # Handle the left side
> > >                 git checkout L &&
> > > -               git mv one three &&
> > > +               git rm one two &&
> >
> > Here you not only remove the file being renamed ('one') but also a
> > file being modified ('two'); you didn't mention the latter in your
> > commit message.
>
> True. But it *is* replaced and `git add`ed later, too.
>
> > Since both are added back later it'll still work either way on linux,
> > but was it suffering from the same problem on Windows?
>
> Yes. All 6 versions of the file are written in quick succession. So yes,
> *any* of these 6 versions that are intended to replace another of said 6
> versions will suffer from the exact same problem.

This was more just to note that the commit message didn't match the
actual change, because the commit message implied it was just doing
this for "renamed" files.  At most it was a minor nit, though I used
it to double check whether I was understanding the issue.  Thanks for
verifying.

> > It too was a case of both the filesize before and after remaining the
> > same despite contents changing, so it certainly seem possible.
>
> Yep.
>
> > >                 mv -f file_v2 three &&
> > >                 mv -f file_v5 two &&
> > >                 git add two three &&
> > > @@ -1183,7 +1183,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
> > >
> > >                 # Handle the right side
> > >                 git checkout R &&
> > > -               git mv two three &&
> > > +               git rm one two &&
> >
> > Also here you remove both a renamed and a modified file -- though in
> > this case I think the file sizes change for both (each increases by
> > one character) so this hunk of the patch probably isn't needed.
>
> It did not make sense for me to use two *different* ways to perform
> essentially the same operation. I know I would scratch my head until it
> hurts if I read intentionally different command cadences, to figure out
> why they are different. Only to find out that they are not intended to be
> different? I don't think so.
>
> > It doesn't hurt though, and could be considered future-proofing against
> > possible changes to the setup files, and may also just be nice to make
> > the two blocks of setup look more consistent
>
> Indeed. I would *hate* to lose time over reading those test cases in 6
> months, just because they use different paradigms (even if they want to do
> the same thing).
>
> And I have to spend *way* too much time reading test scripts. So I am very
> adamant about making my own life easier here.

Oh, I like the consistency better to.  Sorry if I came across
differently, it was more a stream of thought as I looked over the
patch, which I recorded to just note that I had in fact read and
checked it and it all looks good.

> Point in case: it took me quite a good while to understand the code that I
> fixed in this patch, because the underlying intention was not immediately
> accessible to me. The test failure happens in the *next* test case, after
> all. That was a head scratcher, too: the problem was in a test case that
> *passed*, not in the one that failed.

Hmm...I had adopted this style based on my own struggles to understand
regression tests.  I had gotten pretty frustrated finding a failure in
test #87, and finding out that its setup was spread throughout two
dozen of the preceding 86 tests, with another three dozen changes to
the repository in the preceding 86 tests that happen to be irrelevant.
I like the setup for a test being separately contained.  Secondarily,
though, there were a number of tests where I found it hard to see what
they were meaning to test because of a huge mixture of setup and
testing in the same test.  Putting the setup in one test and then the
actual thing of interest being tested in a subsequent test made that a
whole lot clearer.

...or so I thought.  I'm sorry that it to have made it worse for you.
I was trying to make things clearer.

> It all comes back to what I have been harping on for at least a year: our
> test suite is not really optimized for figuring out how to fix test
> failures. There is nothing immediately actionable when anything fails. A
> developer has to spend considerable time to find out what is going wrong
> if a test failure happens (is it flakey? Is it a bug in the *test*? Is it
> a *true regression*?)
>
> Of course, this is not a new problem. And I *do* see that some
> contributors are as interested as I am in a more useful test suite, one
> where a failed test *really* indicates a regression, one where the output
> of a failed test has enough evidence to start fixing the bug right away
> (rather than spending time on understanding the intention of what was
> tested first). For example, when `wc -l` invocations were replaced by
> `test_line_count`, not only did we address portability problems (where BSD
> `wc` would indent the output), we also changed the code to indicate quite
> clearly *what* was tested. We do not yet output the contents of the file
> when `test_i18ngrep` fails, but that would also be a step in the right
> direction.
>
> Here's hoping that our test suite improves in that respect: to make it
> easier and quicker to fix regressions.

Yep, agreed.


Anyway, with the wording change to the commit message you mentioned
above, feel free to add

Reviewed-by: Elijah Newren <newren@gmail.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 0/1] t6042: fix breakage on Windows
  2019-01-16 13:37 [PATCH 0/1] t6042: fix breakage on Windows Johannes Schindelin via GitGitGadget
  2019-01-16 13:37 ` [PATCH 1/1] t6042: work around speed optimization " Johannes Schindelin via GitGitGadget
  2019-01-16 15:32 ` [PATCH 0/1] t6042: fix breakage " Elijah Newren
@ 2019-01-17  8:29 ` Johannes Schindelin via GitGitGadget
  2019-01-17  8:29   ` [PATCH v2 1/1] t6042: work around speed optimization " Johannes Schindelin via GitGitGadget
  2 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-01-17  8:29 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Junio C Hamano

Unfortunately, Travis decided to change some things under the hood that
affect the Windows build. As a consequence, master has not been tested in
quite a while, even if the test runs pretended that it had been tested :-(

So imagine my surprise when master simply would refuse to pass the test
suite cleanly outside Travis, always failing at t6042, despite the fact that
Travis passed.

It turns out that two files are written too quickly in succession, running
into the issue where Git for Windows chooses not to populate the inode and
device numbers in the stat data (this is a noticeable performance
optimization). As a consequence, Git thinks the file is unchanged, and fails
to pick up a modification. And no, we cannot simply undo the performance
optimization, it would make things prohibitively slow in particular in large
worktrees, and it is not like the bug is likely to be hit in reality: for
Git to be fooled into thinking that a file is unchanged, it has to be
written with the same file size, and within a 100ns time bucket (it is
pretty improbable that there is any real-world scenario that would run into 
that, except of course our regression test suite).

This patch works around this issue by forcing Git to recognize the new file
versions as new files (which they really are: the patch simply replaces

git mv <old> <new> && mv <file> <new> && git add <new>`

by

git rm <old> && mv <file> <new> && git add <new>`

which is not shorter, but even a performance improvement (an unnoticeable
one, of course).

Changes since v1:

 * Clarified the change in the commit message.

Johannes Schindelin (1):
  t6042: work around speed optimization on Windows

 t/t6042-merge-rename-corner-cases.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


base-commit: 77556354bb7ac50450e3b28999e3576969869068
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-109%2Fdscho%2Ffix-t6042-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-109/dscho/fix-t6042-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/109

Range-diff vs v1:

 1:  598de6652c ! 1:  55b0a641d4 t6042: work around speed optimization on Windows
     @@ -35,6 +35,12 @@
          now add the files as new files (as opposed to existing, just renamed
          files).
      
     +    Functionally, we do not change anything because we replace two `git mv
     +    <old> <new>` calls (where `<new>` is completely overwritten and `git
     +    add`ed later anyway) by `git rm <old>` calls (removing other files, too,
     +    that are also completely overwritten and `git add`ed later).
     +
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/1] t6042: work around speed optimization on Windows
  2019-01-17  8:29 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
@ 2019-01-17  8:29   ` Johannes Schindelin via GitGitGadget
  2019-01-17 20:57     ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-01-17  8:29 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When Git determines whether a file has changed, it looks at the mtime,
at the file size, and to detect changes even if the mtime is the same
(on Windows, the mtime granularity is 100ns, read: if two files are
written within the same 100ns time slot, they have the same mtime) and
even if the file size is the same, Git also looks at the inode/device
numbers.

This design obviously comes from a Linux background, where `lstat()`
calls were designed to be cheap.

On Windows, there is no `lstat()`. It has to be emulated. And while
obtaining the mtime and the file size is not all that expensive (you can
get both with a single `GetFileAttributesW()` call), obtaining the
equivalent of the inode and device numbers is very expensive (it
requires a call to `GetFileInformationByHandle()`, which in turn
requires a file handle, which is *a lot* more expensive than one might
imagine).

As it is very uncommon for developers to modify files within 100ns time
slots, Git for Windows chooses not to fill inode/device numbers
properly, but simply sets them to 0.

However, in t6042 the files file_v1 and file_v2 are typically written
within the same 100ns time slot, and they do not differ in file size. So
the minor modification is not picked up.

Let's work around this issue by avoiding the `git mv` calls in the
'mod6-setup: chains of rename/rename(1to2) and rename/rename(2to1)' test
case. The target files are overwritten anyway, so it is not like we
really rename those files. This fixes the issue because `git add` will
now add the files as new files (as opposed to existing, just renamed
files).

Functionally, we do not change anything because we replace two `git mv
<old> <new>` calls (where `<new>` is completely overwritten and `git
add`ed later anyway) by `git rm <old>` calls (removing other files, too,
that are also completely overwritten and `git add`ed later).

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t6042-merge-rename-corner-cases.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh
index 7cc34e7579..09dfa8bd92 100755
--- a/t/t6042-merge-rename-corner-cases.sh
+++ b/t/t6042-merge-rename-corner-cases.sh
@@ -1175,7 +1175,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
 
 		# Handle the left side
 		git checkout L &&
-		git mv one three &&
+		git rm one two &&
 		mv -f file_v2 three &&
 		mv -f file_v5 two &&
 		git add two three &&
@@ -1183,7 +1183,7 @@ test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
 
 		# Handle the right side
 		git checkout R &&
-		git mv two three &&
+		git rm one two &&
 		mv -f file_v3 one &&
 		mv -f file_v6 three &&
 		git add one three &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] t6042: work around speed optimization on Windows
  2019-01-16 21:13       ` Elijah Newren
@ 2019-01-17  9:35         ` Johannes Schindelin
  0 siblings, 0 replies; 11+ messages in thread
From: Johannes Schindelin @ 2019-01-17  9:35 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin via GitGitGadget, Git Mailing List,
	Junio C Hamano

Hi Elijah,

On Wed, 16 Jan 2019, Elijah Newren wrote:

> On Wed, Jan 16, 2019 at 12:31 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Point in case: it took me quite a good while to understand the code
> > that I fixed in this patch, because the underlying intention was not
> > immediately accessible to me. The test failure happens in the *next*
> > test case, after all. That was a head scratcher, too: the problem was
> > in a test case that *passed*, not in the one that failed.
> 
> Hmm...I had adopted this style based on my own struggles to understand
> regression tests.  I had gotten pretty frustrated finding a failure in
> test #87, and finding out that its setup was spread throughout two dozen
> of the preceding 86 tests, with another three dozen changes to the
> repository in the preceding 86 tests that happen to be irrelevant.  I
> like the setup for a test being separately contained.  Secondarily,
> though, there were a number of tests where I found it hard to see what
> they were meaning to test because of a huge mixture of setup and testing
> in the same test.  Putting the setup in one test and then the actual
> thing of interest being tested in a subsequent test made that a whole
> lot clearer.
> 
> ...or so I thought.  I'm sorry that it to have made it worse for you.  I
> was trying to make things clearer.

Thank you for caring. Truth be told: while it took me like 15 minutes to
figure out that the problem was in another test case than the failing one,
which is not my personal record (I forget the details, but I am fairly
certain that there was a test suite failure that I had to hunt for
multiple hours). So it is not all that bad.

And yes, having a single setup test case rather than dozens is definitely
a good thing.

I just miss the expressiveness of object-oriented test frameworks, where
you would define data structures with intuitive method names, where I
would have immediately thought: oh, so this wanted to merge diverging
histories, but, but, but, the first arm does not even have a change...
now, where should that have been changed?

As I said, your test cases are not the most convoluted ones, and it is so
good of you not to add more test cases relying on side effects from
previous non-setup test cases (there are quite a few offenders in the test
suite, and I am ashamed to admit that even I added some.)

> Anyway, with the wording change to the commit message you mentioned
> above, feel free to add
> 
> Reviewed-by: Elijah Newren <newren@gmail.com>

Done, thank you so much!
Dscho

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/1] t6042: work around speed optimization on Windows
  2019-01-17  8:29   ` [PATCH v2 1/1] t6042: work around speed optimization " Johannes Schindelin via GitGitGadget
@ 2019-01-17 20:57     ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2019-01-17 20:57 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Elijah Newren, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> When Git determines whether a file has changed, it looks at the mtime,
> at the file size, ...
> Functionally, we do not change anything because we replace two `git mv
> <old> <new>` calls (where `<new>` is completely overwritten and `git
> add`ed later anyway) by `git rm <old>` calls (removing other files, too,
> that are also completely overwritten and `git add`ed later).

Thanks, both.  Will queue.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-17 20:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16 13:37 [PATCH 0/1] t6042: fix breakage on Windows Johannes Schindelin via GitGitGadget
2019-01-16 13:37 ` [PATCH 1/1] t6042: work around speed optimization " Johannes Schindelin via GitGitGadget
2019-01-16 15:56   ` Elijah Newren
2019-01-16 20:30     ` Johannes Schindelin
2019-01-16 21:13       ` Elijah Newren
2019-01-17  9:35         ` Johannes Schindelin
2019-01-16 15:32 ` [PATCH 0/1] t6042: fix breakage " Elijah Newren
2019-01-16 20:41   ` Johannes Schindelin
2019-01-17  8:29 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
2019-01-17  8:29   ` [PATCH v2 1/1] t6042: work around speed optimization " Johannes Schindelin via GitGitGadget
2019-01-17 20:57     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).