git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] deterministic commit timestamps in tests
@ 2020-07-09 20:33 Jeff King
  2020-07-09 20:34 ` [PATCH 1/4] t6000: use test_tick consistently Jeff King
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Jeff King @ 2020-07-09 20:33 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys

Most tests use test_tick or test_commit to get deterministic timestamps
in commits. Some don't because they don't care about the timestamps. But
they can still be annoying to debug, since the output changes from run
to run.

This series sets a default timestamp for those scripts, in a way that
doesn't interfere with anybody using test_tick. The change is in the
final commit. The others are preparatory to deal with fallout in the
test suite. Normally fallout would give me pause about whether this is a
good idea, but a) there's not much of it and b) I think it helped shake
out questionable assumptions in those tests.

This a replacement for the patches being discussed in:

  https://lore.kernel.org/git/pull.816.git.git.1594149804313.gitgitgadget@gmail.com/

It looks like Junio already picked up one of the fixes as
jk/t6000-timestamp-fix. The first patch here is identical (so we can
either drop that branch, or drop the first patch from this series and
apply on top).

  [1/4]: t6000: use test_tick consistently
  [2/4]: t9700: loosen ident timezone regex
  [3/4]: t5539: make timestamp requirements more explicit
  [4/4]: test-lib: set deterministic default author/committer date

 t/t5539-fetch-http-shallow.sh | 4 +++-
 t/t6000-rev-list-misc.sh      | 7 +++++--
 t/t9700/test.pl               | 6 +++---
 t/test-lib.sh                 | 3 +++
 4 files changed, 14 insertions(+), 6 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] t6000: use test_tick consistently
  2020-07-09 20:33 [PATCH 0/4] deterministic commit timestamps in tests Jeff King
@ 2020-07-09 20:34 ` Jeff King
  2020-07-09 20:35 ` [PATCH 2/4] t9700: loosen ident timezone regex Jeff King
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-07-09 20:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys

The first two commits created in t6000 are done without test_tick,
meaning they use the current system clock. After that, we create one
with test_tick, which means it uses a deterministic time in the past.

The result of the "symleft flag bit is propagated down from tag" test
relies on the output order of commits from git-log, which in turn
depends on these timestamps. So this test is technically dependent on
the system clock time, though in practice it would only matter if your
system clock was set before test_tick's default time (which is in 2005).

However, let's use test_tick consistently for those early commits (and
update the expected output to match). This makes the test deterministic,
which is in turn easier to reason about and debug.

Note that there's also a fourth commit here, and it does not use
test_tick. It does have a deterministic timestamp because of the prior
use of test_tick in the script, but it will always be the same time as
the third commit. Let's use test_tick here, too, for consistency.  The
matching timestamps between the third and fourth commit are not an
important part of the test.

We could also use test_commit in all of these cases, as it runs
test_tick under the hood. But it would be awkward to do so, as these
tests diverge from the usual test_commit patterns (e.g., by creating
multiple files in a single commit).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t6000-rev-list-misc.sh | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 3dc1ad8f71..3bb0e4ff8f 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -8,6 +8,7 @@ test_expect_success setup '
 	echo content1 >wanted_file &&
 	echo content2 >unwanted_file &&
 	git add wanted_file unwanted_file &&
+	test_tick &&
 	git commit -m one
 '
 
@@ -21,6 +22,7 @@ test_expect_success 'rev-list --objects with pathspecs and deeper paths' '
 	mkdir foo &&
 	>foo/file &&
 	git add foo/file &&
+	test_tick &&
 	git commit -m two &&
 
 	git rev-list --objects HEAD -- foo >output &&
@@ -69,6 +71,7 @@ test_expect_success '--no-object-names and --object-names are last-one-wins' '
 '
 
 test_expect_success 'rev-list A..B and rev-list ^A B are the same' '
+	test_tick &&
 	git commit --allow-empty -m another &&
 	git tag -a -m "annotated" v1.0 &&
 	git rev-list --objects ^v1.0^ v1.0 >expect &&
@@ -84,10 +87,10 @@ test_expect_success 'propagate uninteresting flag down correctly' '
 test_expect_success 'symleft flag bit is propagated down from tag' '
 	git log --format="%m %s" --left-right v1.0...master >actual &&
 	cat >expect <<-\EOF &&
-	> two
-	> one
 	< another
 	< that
+	> two
+	> one
 	EOF
 	test_cmp expect actual
 '
-- 
2.27.0.730.gcc195a083d


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/4] t9700: loosen ident timezone regex
  2020-07-09 20:33 [PATCH 0/4] deterministic commit timestamps in tests Jeff King
  2020-07-09 20:34 ` [PATCH 1/4] t6000: use test_tick consistently Jeff King
@ 2020-07-09 20:35 ` Jeff King
  2020-07-09 20:39 ` [PATCH 3/4] t5539: make timestamp requirements more explicit Jeff King
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-07-09 20:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys

A few of the perl tests in t9700 ask for the author and committer ident,
and then make sure we get something sensible. For the timestamp portion,
we just match [0-9]+, because the actual value will depend on when the
test is run. However, we do require that the timezone be "+0000". This
works reliably because we set $TZ in test-lib.sh. But in preparation for
changing the default timezone, let's be a bit more flexible. We don't
actually care about the exact value here, just that we were able to get
a sensible output from the perl module's access methods.

Signed-off-by: Jeff King <peff@peff.net>
---
Alternatively, we could use test_tick here and then make sure we got the
expected timestamp. Or revisit it after patch 4 and use the default date
from there. But I think the intent of the tests is fulfilled by
remaining flexible.

 t/t9700/test.pl | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t9700/test.pl b/t/t9700/test.pl
index 34cd01366f..c59a015f89 100755
--- a/t/t9700/test.pl
+++ b/t/t9700/test.pl
@@ -59,15 +59,15 @@ sub adjust_dirsep {
 open STDERR, ">&", $tmpstderr or die "cannot restore STDERR";
 
 # ident
-like($r->ident("aUthor"), qr/^A U Thor <author\@example.com> [0-9]+ \+0000$/,
+like($r->ident("aUthor"), qr/^A U Thor <author\@example.com> [0-9]+ [+-]\d{4}$/,
      "ident scalar: author (type)");
-like($r->ident("cOmmitter"), qr/^C O Mitter <committer\@example.com> [0-9]+ \+0000$/,
+like($r->ident("cOmmitter"), qr/^C O Mitter <committer\@example.com> [0-9]+ [+-]\d{4}$/,
      "ident scalar: committer (type)");
 is($r->ident("invalid"), "invalid", "ident scalar: invalid ident string (no parsing)");
 my ($name, $email, $time_tz) = $r->ident('author');
 is_deeply([$name, $email], ["A U Thor", "author\@example.com"],
 	 "ident array: author");
-like($time_tz, qr/[0-9]+ \+0000/, "ident array: author");
+like($time_tz, qr/[0-9]+ [+-]\d{4}/, "ident array: author");
 is_deeply([$r->ident("Name <email> 123 +0000")], ["Name", "email", "123 +0000"],
 	  "ident array: ident string");
 is_deeply([$r->ident("invalid")], [], "ident array: invalid ident string");
-- 
2.27.0.730.gcc195a083d


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/4] t5539: make timestamp requirements more explicit
  2020-07-09 20:33 [PATCH 0/4] deterministic commit timestamps in tests Jeff King
  2020-07-09 20:34 ` [PATCH 1/4] t6000: use test_tick consistently Jeff King
  2020-07-09 20:35 ` [PATCH 2/4] t9700: loosen ident timezone regex Jeff King
@ 2020-07-09 20:39 ` Jeff King
  2020-07-09 20:42 ` [PATCH 4/4] test-lib: set deterministic default author/committer date Jeff King
  2020-07-10 22:02 ` [PATCH 0/4] deterministic commit timestamps in tests Junio C Hamano
  4 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-07-09 20:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys

The test for "no shallow lines after receiving ACK ready" is very
sensitive to the timestamps of the commits we create. It's looking for
the fetch negotiation to send a "ready", which in turn depends on the
order in which we traverse commits during the negotiation.

It works reliably now because the base commit "7" is created without
test_commit, and thus gets a commit time matching the current system
clock. Whereas the new commits created in this test do use test_commit,
and get the usual test_tick time from 2005. So the fetch into the
"clone" repository results in a commit graph like this (I omitted some
of the "unrelated" commits for clarity; they're all just a sequence of
test_ticks):

  $ git log --graph --format='%ct %s %d'
  * 1112912953 new  (origin/master, origin/HEAD)
  * 1594322236 7  (grafted, master)
  * 1112912893 unrelated15  (origin/unrelated15, unrelated15)
  [...]
  * 1112912053 unrelated1  (origin/unrelated1, unrelated1)
  * 1112911993 new-too  (HEAD -> newnew, tag: new-too)

The important things to see are:

  - "7" is way in the future compared to the other commits

  - "new-too" in the fetching repo is older than "new" (and its
    "unrelated" ancestors) in the shallow repo

If we change our "setup shallow clone" step to use test_tick, too (and
get rid of the dependency on the system clock), then the test will fail.
The resulting graph looks like this:

  $ git log --graph --format='%ct %s %d'
  * 1112913373 new  (origin/master, origin/HEAD)
  * 1112912353 7  (grafted, master)
  * 1112913313 unrelated15  (origin/unrelated15, unrelated15)
  [...]
  * 1112912473 unrelated1  (origin/unrelated1, unrelated1)
  * 1112912413 new-too  (HEAD -> newnew, tag: new-too)

Our "new-too" is still older than "new" and "unrelated", but now "7" is
older than all of them (because it advanced test_tick, which the other
tests built on top of). In the original, we advertised "7" as the first
"have" before anything else, but now "new-too" is more recent. You'd see
the same thing in the unlikely event that the system clock was set
before our test_tick default in 2005.

Let's make the timing requirements more explicit. The important thing is
that the client advertise all of its shared commits first, before
presenting its unique "new-too" commit. We can do that and get rid of
the system clock dependency at the same time by creating all of the
shared commits around time X (using test_tick), and then creating
"new-too" with some time long before X. The resulting graph looks like
this:

  $ git log --graph --format='%ct %s %d'
  * 1500001380 new  (origin/master, origin/HEAD)
  * 1500000420 7  (grafted, master)
  * 1500001320 unrelated15  (origin/unrelated15, unrelated15)
  [...]
  * 1500000480 unrelated1  (origin/unrelated1, unrelated1)
  * 1400000060 new-too  (HEAD -> newnew, tag: new-too)

That also lets us get rid of the hacky test_tick added by f0e802ca20
(t5539: update a flaky test, 2014-07-14). That was clearly dancing
around the same problem, but only addressed the relationship between
commits created in the two subshells (which did use test_tick, but
overlapped because increments of test_tick in subshells are lost). Now
that we're using consistent and well-placed times for both lines of
history, we don't have to care about a one-tick difference between the
two sides.

Signed-off-by: Jeff King <peff@peff.net>
---
Curiously, the test still passes if "7" has the same timestamp as
"new-too"! That's why I didn't notice the failure in the original
thread, where I looked at setting the deterministic default time to the
default test_tick time. I didn't dig in, but I believe it only worked
because we currently use insertion order to break timestamp ties. And
since "master" sorts before "newnew", we'd queue "7" before "new-too".

 t/t5539-fetch-http-shallow.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t5539-fetch-http-shallow.sh b/t/t5539-fetch-http-shallow.sh
index c0d02dee89..82aa99ae87 100755
--- a/t/t5539-fetch-http-shallow.sh
+++ b/t/t5539-fetch-http-shallow.sh
@@ -9,10 +9,12 @@ start_httpd
 commit() {
 	echo "$1" >tracked &&
 	git add tracked &&
+	test_tick &&
 	git commit -m "$1"
 }
 
 test_expect_success 'setup shallow clone' '
+	test_tick=1500000000 &&
 	commit 1 &&
 	commit 2 &&
 	commit 3 &&
@@ -48,7 +50,6 @@ EOF
 test_expect_success 'no shallow lines after receiving ACK ready' '
 	(
 		cd shallow &&
-		test_tick &&
 		for i in $(test_seq 15)
 		do
 			git checkout --orphan unrelated$i &&
@@ -66,6 +67,7 @@ test_expect_success 'no shallow lines after receiving ACK ready' '
 	(
 		cd clone &&
 		git checkout --orphan newnew &&
+		test_tick=1400000000 &&
 		test_commit new-too &&
 		# NEEDSWORK: If the overspecification of the expected result is reduced, we
 		# might be able to run this test in all protocol versions.
-- 
2.27.0.730.gcc195a083d


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] test-lib: set deterministic default author/committer date
  2020-07-09 20:33 [PATCH 0/4] deterministic commit timestamps in tests Jeff King
                   ` (2 preceding siblings ...)
  2020-07-09 20:39 ` [PATCH 3/4] t5539: make timestamp requirements more explicit Jeff King
@ 2020-07-09 20:42 ` Jeff King
  2020-07-10 22:02 ` [PATCH 0/4] deterministic commit timestamps in tests Junio C Hamano
  4 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-07-09 20:42 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys

We always set the name and email for committer and author idents to make
the test suite more deterministic, but not timestamps. Many scripts use
test_tick to get consistent and sensibly incrementing timestamps as they
create commits. But other scripts don't particularly care about the
timestamp, and are happy to use whatever the current system time is.

This non-determinism can be annoying:

  - when debugging a test, comparing results between two runs can be
    difficult, because the commit ids change

  - this can sometimes cause tests to be racy. E.g., traversal order
    depends on timestamp order. Even in a well-ordered set of commands,
    because our timestamp granularity is one second, two commits might
    sometimes have the same timestamp and sometimes differ.

Let's set a default timestamp for all scripts to use. Any that use
test_tick already will be unaffected (because their first test_tick call
will overwrite our default), but it will make things a bit more
deterministic for those that don't.

We should be able to choose any time we want here. I picked this one
because:

  - it differs from the initial test_tick default, which may make it
    easier to distinguish when debugging tests. I picked "April 1st
    13:14:15" in the hope that it might stand out.

  - it's slightly before the test_tick default. Some tests create some
    commits before the first call to test_tick, so using an older
    timestamps for those makes sense chronologically. Note that this
    isn't how things currently work (where system times are usually more
    recent than test_tick), but that also allows us to flush out a few
    hidden timestamp dependencies (like the one recently fixed in
    t5539).

  - we could likewise pick any timezone we want. Choosing +0000 would
    have required fixing up fewer tests, but we're more likely to turn
    up interesting cases by not matching $TZ exactly. And since
    test_tick already checks "-0700", let's try something in the "+"
    zone range for variety.

It's possible that the non-deterministic times could help flush out bugs
(e.g., if something broke when the clock flipped over to 2021, our test
suite would let us know). But historically that hasn't been the case;
all time-dependent outcomes we've seen turned out to be accidentally
flaky tests (which we fixed by using test_tick). If we do want to cover
handling the current time, we should dedicate one script to doing so,
and have it unset GIT_COMMITTER_DATE explicitly.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 618a7c8d5b..ba224c86f5 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -441,15 +441,18 @@ TEST_AUTHOR_LOCALNAME=author
 TEST_AUTHOR_DOMAIN=example.com
 GIT_AUTHOR_EMAIL=${TEST_AUTHOR_LOCALNAME}@${TEST_AUTHOR_DOMAIN}
 GIT_AUTHOR_NAME='A U Thor'
+GIT_AUTHOR_DATE='1112354055 +0200'
 TEST_COMMITTER_LOCALNAME=committer
 TEST_COMMITTER_DOMAIN=example.com
 GIT_COMMITTER_EMAIL=${TEST_COMMITTER_LOCALNAME}@${TEST_COMMITTER_DOMAIN}
 GIT_COMMITTER_NAME='C O Mitter'
+GIT_COMMITTER_DATE='1112354055 +0200'
 GIT_MERGE_VERBOSITY=5
 GIT_MERGE_AUTOEDIT=no
 export GIT_MERGE_VERBOSITY GIT_MERGE_AUTOEDIT
 export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
 export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
+export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 export EDITOR
 
 # Tests using GIT_TRACE typically don't want <timestamp> <file>:<line> output
-- 
2.27.0.730.gcc195a083d

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] deterministic commit timestamps in tests
  2020-07-09 20:33 [PATCH 0/4] deterministic commit timestamps in tests Jeff King
                   ` (3 preceding siblings ...)
  2020-07-09 20:42 ` [PATCH 4/4] test-lib: set deterministic default author/committer date Jeff King
@ 2020-07-10 22:02 ` Junio C Hamano
  2020-07-14 12:31   ` Jeff King
  4 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-07-10 22:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Han-Wen Nienhuys

Jeff King <peff@peff.net> writes:

> Most tests use test_tick or test_commit to get deterministic timestamps
> in commits. Some don't because they don't care about the timestamps. But
> they can still be annoying to debug, since the output changes from run
> to run.
>
> This series sets a default timestamp for those scripts, in a way that
> doesn't interfere with anybody using test_tick. The change is in the
> final commit. The others are preparatory to deal with fallout in the
> test suite. Normally fallout would give me pause about whether this is a
> good idea, but a) there's not much of it and b) I think it helped shake
> out questionable assumptions in those tests.
>
> This a replacement for the patches being discussed in:
>
>   https://lore.kernel.org/git/pull.816.git.git.1594149804313.gitgitgadget@gmail.com/
>
> It looks like Junio already picked up one of the fixes as
> jk/t6000-timestamp-fix. The first patch here is identical (so we can
> either drop that branch, or drop the first patch from this series and
> apply on top).
>
>   [1/4]: t6000: use test_tick consistently
>   [2/4]: t9700: loosen ident timezone regex
>   [3/4]: t5539: make timestamp requirements more explicit
>   [4/4]: test-lib: set deterministic default author/committer date
>
>  t/t5539-fetch-http-shallow.sh | 4 +++-
>  t/t6000-rev-list-misc.sh      | 7 +++++--
>  t/t9700/test.pl               | 6 +++---
>  t/test-lib.sh                 | 3 +++
>  4 files changed, 14 insertions(+), 6 deletions(-)
>
> -Peff

I have this queued on top for today's integration run.  The last
step is something worth doing in the longer term, but certainly not
for the upcoming release ;-).

-- >8 --
Subject: [PATCH] BANDAID: t9100.17 & t9100.18

---
 t/t9100-git-svn-basic.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh
index 2c309a57d9..33ea813585 100755
--- a/t/t9100-git-svn-basic.sh
+++ b/t/t9100-git-svn-basic.sh
@@ -8,6 +8,8 @@ GIT_SVN_LC_ALL=${LC_ALL:-$LANG}
 
 . ./lib-git-svn.sh
 
+unset GIT_AUTHOR_DATE GIT_COMMITTER_DATE
+
 case "$GIT_SVN_LC_ALL" in
 *.UTF-8)
 	test_set_prereq UTF8
-- 
2.28.0-rc0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] deterministic commit timestamps in tests
  2020-07-10 22:02 ` [PATCH 0/4] deterministic commit timestamps in tests Junio C Hamano
@ 2020-07-14 12:31   ` Jeff King
  2020-07-14 12:33     ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-07-14 12:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, git, Han-Wen Nienhuys

On Fri, Jul 10, 2020 at 03:02:01PM -0700, Junio C Hamano wrote:

> I have this queued on top for today's integration run.  The last
> step is something worth doing in the longer term, but certainly not
> for the upcoming release ;-).
> 
> -- >8 --
> Subject: [PATCH] BANDAID: t9100.17 & t9100.18

Oof, thanks for catching. I don't usually have subversion on my system
at all. I have been trying to pay more attention to our CI, but I have
another topic I've been working on that has been causing failures in my
integration runs, so I didn't notice this one.

It took me a while to untangle just what's happening in the test, but I
think your "unset" workaround is actually what we want. The patch below
could go second-to-last in jk/tests-timestamp-fix, before the actual
switch to setting GIT_COMMITTER_DATE.

-- >8 --
Subject: [PATCH] t9100: explicitly unset GIT_COMMITTER_DATE

The early part of t9100 creates an unusual "doubled" history in the
"git-svn" ref. When we get to t9100.17, it looks like this:

  $ git log --oneline --graph git-svn
  [...]
  *   efd0303 detect node change from file to directory #2
  |\
  * | 3e727c0 detect node change from file to directory #2
  |/
  *   3b00468 try a deep --rmdir with a commit
  |\
  * | b4832d8 try a deep --rmdir with a commit
  |/
  * f0d7bd5 import for git svn

Each commit we make with "git commit" is paired with one from "git svn
set-tree", with the latter as a merge of the first and its grandparent.

Later, t9100.17 wants to check that "git svn fetch" gets the same trees.
And it does, but just one copy of each. So it uses rev-list to get the
tree of each commit and pipes it to "uniq" to drop the duplicates. Our
input isn't sorted, but it will find adjacent duplicates. This works
reliably because the order of commits from rev-list always shows the
duplicates next to each other. For any one of those merges, we could
choose to show its duplicate or the grandparent first. But barring
clocks running backwards, the duplicate will always have a time equal to
or greater than the grandparent. Even if equal, we break ties by showing
the first-parent first, so the duplicates remain adjacent.

But this would break if the timestamps stopped moving in chronological
order. Normally we would rely on test_tick for this, but we have _two_
sources of time here:

  - "git commit" creates one commit based on GIT_COMMITTER_DATE (which
    respects test_tick)

  - the "svn set-tree" one is based on subversion, which does not have
    an easy way to specify a timestamp

So using test_tick actually breaks the test, because now the duplicates
are far in the past, and we'll show the grandparent before the
duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in
all scripts will break it.

We _could_ fix this by sorting before removing duplicates, but
presumably it's a useful part of the test to make sure the trees appear
in the same order in both spots. Likewise, we could use something like:

  perl -ne 'print unless $seen{$_}++'

to remove duplicates without impacting the order. But that doesn't work
either, because there are actually multiple (non-duplicate) commits with
the same trees (we change a file mode and then change it back). So we'd
actually have to de-duplicate the combination of subject and tree. Which
then further throws off t9100.18, which compares the tree hashes
exactly; we'd have to strip the result back down.

Since this test _isn't_ buggy, the simplest thing is to just work around
the proposed change by documenting our expectation that git-created
commits are correctly interleaved using the current time.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t9100-git-svn-basic.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh
index 9f2d19ecc4..b80952f0ac 100755
--- a/t/t9100-git-svn-basic.sh
+++ b/t/t9100-git-svn-basic.sh
@@ -8,6 +8,10 @@ GIT_SVN_LC_ALL=${LC_ALL:-$LANG}
 
 . ./lib-git-svn.sh
 
+# Make sure timestamps of commits created by Git interleave
+# with those created by "svn set-tree".
+unset GIT_COMMITTER_DATE
+
 case "$GIT_SVN_LC_ALL" in
 *.UTF-8)
 	test_set_prereq UTF8
-- 
2.28.0.rc0.402.g7b15ae678a


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] deterministic commit timestamps in tests
  2020-07-14 12:31   ` Jeff King
@ 2020-07-14 12:33     ` Jeff King
  2020-07-14 21:47       ` Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-07-14 12:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, git, Han-Wen Nienhuys

On Tue, Jul 14, 2020 at 08:31:42AM -0400, Jeff King wrote:

> We _could_ fix this by sorting before removing duplicates, but
> presumably it's a useful part of the test to make sure the trees appear
> in the same order in both spots. Likewise, we could use something like:
> 
>   perl -ne 'print unless $seen{$_}++'
> 
> to remove duplicates without impacting the order. But that doesn't work
> either, because there are actually multiple (non-duplicate) commits with
> the same trees (we change a file mode and then change it back). So we'd
> actually have to de-duplicate the combination of subject and tree. Which
> then further throws off t9100.18, which compares the tree hashes
> exactly; we'd have to strip the result back down.

Actually, that last one isn't _too_ bad. It looks something like this:

diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh
index b80952f0ac..4502c5f97d 100755
--- a/t/t9100-git-svn-basic.sh
+++ b/t/t9100-git-svn-basic.sh
@@ -204,8 +204,10 @@ GIT_SVN_ID=alt
 export GIT_SVN_ID
 test_expect_success "$name" \
     'git svn init "$svnrepo" && git svn fetch &&
-     git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a &&
-     git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b &&
+     git log --format="tree %T %s" remotes/git-svn |
+	perl -lne "print unless \$seen{\$_}++" |
+	cut -d" " -f1-2 >a &&
+     git log --format="tree %T" remotes/alt >b &&
      test_cmp a b'
 
 name='check imported tree checksums expected tree checksums'

It does lose a little bit of information, which is that in the original
we confirmed that the duplicates were always next to each other. But I'm
not sure that's important. We'd get confused if the same subject
appeared twice, but all of the commits have distinct hard-coded
subjects in the earlier tests.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] deterministic commit timestamps in tests
  2020-07-14 12:33     ` Jeff King
@ 2020-07-14 21:47       ` Eric Wong
  2020-07-15  7:42         ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2020-07-14 21:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Han-Wen Nienhuys

Jeff King <peff@peff.net> wrote:
> On Tue, Jul 14, 2020 at 08:31:42AM -0400, Jeff King wrote:
> 
> > We _could_ fix this by sorting before removing duplicates, but
> > presumably it's a useful part of the test to make sure the trees appear
> > in the same order in both spots. Likewise, we could use something like:
> > 
> >   perl -ne 'print unless $seen{$_}++'
> > 
> > to remove duplicates without impacting the order. But that doesn't work
> > either, because there are actually multiple (non-duplicate) commits with
> > the same trees (we change a file mode and then change it back). So we'd
> > actually have to de-duplicate the combination of subject and tree. Which
> > then further throws off t9100.18, which compares the tree hashes
> > exactly; we'd have to strip the result back down.

Right, log order matters, so sorting isn't ideal.

> Actually, that last one isn't _too_ bad. It looks something like this:
> 
> diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh
> index b80952f0ac..4502c5f97d 100755
> --- a/t/t9100-git-svn-basic.sh
> +++ b/t/t9100-git-svn-basic.sh
> @@ -204,8 +204,10 @@ GIT_SVN_ID=alt
>  export GIT_SVN_ID
>  test_expect_success "$name" \
>      'git svn init "$svnrepo" && git svn fetch &&
> -     git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a &&
> -     git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b &&
> +     git log --format="tree %T %s" remotes/git-svn |
> +	perl -lne "print unless \$seen{\$_}++" |
> +	cut -d" " -f1-2 >a &&
> +     git log --format="tree %T" remotes/alt >b &&
>       test_cmp a b'

The future of non-strict one-liners with Perl7 on the horizon
seems uncertain :<   cut is unnecessary either way, but
I suggest awk, here:

	awk "!seen[\$0]++ { print \$1, \$2 }'

>  name='check imported tree checksums expected tree checksums'
> 
> It does lose a little bit of information, which is that in the original
> we confirmed that the duplicates were always next to each other. But I'm
> not sure that's important. We'd get confused if the same subject
> appeared twice, but all of the commits have distinct hard-coded
> subjects in the earlier tests.

Yeah, but I think it's fine.  It's been a while since I wrote
this

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] deterministic commit timestamps in tests
  2020-07-14 21:47       ` Eric Wong
@ 2020-07-15  7:42         ` Jeff King
  2020-07-15 14:50           ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-07-15  7:42 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Han-Wen Nienhuys

On Tue, Jul 14, 2020 at 09:47:28PM +0000, Eric Wong wrote:

> > -     git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a &&
> > -     git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b &&
> > +     git log --format="tree %T %s" remotes/git-svn |
> > +	perl -lne "print unless \$seen{\$_}++" |
> > +	cut -d" " -f1-2 >a &&
> > +     git log --format="tree %T" remotes/alt >b &&
> >       test_cmp a b'
> 
> The future of non-strict one-liners with Perl7 on the horizon
> seems uncertain :<   cut is unnecessary either way, but
> I suggest awk, here:
> 
> 	awk "!seen[\$0]++ { print \$1, \$2 }'

Thanks, that is nicer.

> >  name='check imported tree checksums expected tree checksums'
> > 
> > It does lose a little bit of information, which is that in the original
> > we confirmed that the duplicates were always next to each other. But I'm
> > not sure that's important. We'd get confused if the same subject
> > appeared twice, but all of the commits have distinct hard-coded
> > subjects in the earlier tests.
> 
> Yeah, but I think it's fine.  It's been a while since I wrote
> this

OK. If you're on board, then I think doing it this way is slightly
nicer, as it's less likely to be confusing or bite somebody in the
future.

Here's a revised patch (I see Junio already picked up the other fix; if
that ends up being merged instead, that's not the end of the world).

(compared to the earlier version, you can skip everything in the commit
message before "One fix would be...").

-- >8 --
Subject: [PATCH] t9100: stop depending on commit timestamps

The early part of t9100 creates an unusual "doubled" history in the
"git-svn" ref. When we get to t9100.17, it looks like this:

  $ git log --oneline --graph git-svn
  [...]
  *   efd0303 detect node change from file to directory #2
  |\
  * | 3e727c0 detect node change from file to directory #2
  |/
  *   3b00468 try a deep --rmdir with a commit
  |\
  * | b4832d8 try a deep --rmdir with a commit
  |/
  * f0d7bd5 import for git svn

Each commit we make with "git commit" is paired with one from "git svn
set-tree", with the latter as a merge of the first and its grandparent.

Later, t9100.17 wants to check that "git svn fetch" gets the same trees.
And it does, but just one copy of each. So it uses rev-list to get the
tree of each commit and pipes it to "uniq" to drop the duplicates. Our
input isn't sorted, but it will find adjacent duplicates. This works
reliably because the order of commits from rev-list always shows the
duplicates next to each other. For any one of those merges, we could
choose to show its duplicate or the grandparent first. But barring
clocks running backwards, the duplicate will always have a time equal to
or greater than the grandparent. Even if equal, we break ties by showing
the first-parent first, so the duplicates remain adjacent.

But this would break if the timestamps stopped moving in chronological
order. Normally we would rely on test_tick for this, but we have _two_
sources of time here:

  - "git commit" creates one commit based on GIT_COMMITTER_DATE (which
    respects test_tick)

  - the "svn set-tree" one is based on subversion, which does not have
    an easy way to specify a timestamp

So using test_tick actually breaks the test, because now the duplicates
are far in the past, and we'll show the grandparent before the
duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in
all scripts will break it.

One fix would be to sort the list of trees before removing duplicates,
but that loses information:

  - we do care that the fetched history is in the same order

  - there's a tree which appears twice in the history, and we'd want to
    make sure that it's there both times

So instead, let's de-duplicate using a hash (preserving the order), and
drop only lines with identical trees and subjects (preserving the tree
which appears twice, since it has different subjects each time).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t9100-git-svn-basic.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh
index 9f2d19ecc4..3055943a22 100755
--- a/t/t9100-git-svn-basic.sh
+++ b/t/t9100-git-svn-basic.sh
@@ -200,8 +200,9 @@ GIT_SVN_ID=alt
 export GIT_SVN_ID
 test_expect_success "$name" \
     'git svn init "$svnrepo" && git svn fetch &&
-     git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a &&
-     git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b &&
+     git log --format="tree %T %s" remotes/git-svn |
+	awk "!seen[\$0]++ { print \$1, \$2 }" >a &&
+     git log --format="tree %T" alt >b &&
      test_cmp a b'
 
 name='check imported tree checksums expected tree checksums'
-- 
2.28.0.rc0.394.ga62ae196ad


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] deterministic commit timestamps in tests
  2020-07-15  7:42         ` Jeff King
@ 2020-07-15 14:50           ` Junio C Hamano
  2020-07-15 15:04             ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-07-15 14:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Wong, git, Han-Wen Nienhuys

Jeff King <peff@peff.net> writes:

> Here's a revised patch (I see Junio already picked up the other fix; if
> that ends up being merged instead, that's not the end of the world).
>
> (compared to the earlier version, you can skip everything in the commit
> message before "One fix would be...").

Heh, it already is in 'next' and I do not think it is worth
rewinding and rebuilding to cause downstream folks an additional
trouble.  I however think this updated solution is nicer and do not
mind building on top i.e. "while an earlier change did unbreak the
svn tests, relying on the current timestamps is not nice and here is
an update".

Having said that, I think it is more urgent to address the "ouch, we
made it clera that repos with extensions.worktreeconfig set without
marking them as repoformat v1 are broken and without giving users
enough hints to recover" issue discussed elsewhere before -rc1 (and
for that reason I do not think I can tag -rc1 today), so I'd
backburner it.  This topic won't merge down from 'next' until final
anyway.

Thanks.


> -- >8 --
> Subject: [PATCH] t9100: stop depending on commit timestamps
>
> The early part of t9100 creates an unusual "doubled" history in the
> "git-svn" ref. When we get to t9100.17, it looks like this:
>
>   $ git log --oneline --graph git-svn
>   [...]
>   *   efd0303 detect node change from file to directory #2
>   |\
>   * | 3e727c0 detect node change from file to directory #2
>   |/
>   *   3b00468 try a deep --rmdir with a commit
>   |\
>   * | b4832d8 try a deep --rmdir with a commit
>   |/
>   * f0d7bd5 import for git svn
>
> Each commit we make with "git commit" is paired with one from "git svn
> set-tree", with the latter as a merge of the first and its grandparent.
>
> Later, t9100.17 wants to check that "git svn fetch" gets the same trees.
> And it does, but just one copy of each. So it uses rev-list to get the
> tree of each commit and pipes it to "uniq" to drop the duplicates. Our
> input isn't sorted, but it will find adjacent duplicates. This works
> reliably because the order of commits from rev-list always shows the
> duplicates next to each other. For any one of those merges, we could
> choose to show its duplicate or the grandparent first. But barring
> clocks running backwards, the duplicate will always have a time equal to
> or greater than the grandparent. Even if equal, we break ties by showing
> the first-parent first, so the duplicates remain adjacent.
>
> But this would break if the timestamps stopped moving in chronological
> order. Normally we would rely on test_tick for this, but we have _two_
> sources of time here:
>
>   - "git commit" creates one commit based on GIT_COMMITTER_DATE (which
>     respects test_tick)
>
>   - the "svn set-tree" one is based on subversion, which does not have
>     an easy way to specify a timestamp
>
> So using test_tick actually breaks the test, because now the duplicates
> are far in the past, and we'll show the grandparent before the
> duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in
> all scripts will break it.
>
> One fix would be to sort the list of trees before removing duplicates,
> but that loses information:
>
>   - we do care that the fetched history is in the same order
>
>   - there's a tree which appears twice in the history, and we'd want to
>     make sure that it's there both times
>
> So instead, let's de-duplicate using a hash (preserving the order), and
> drop only lines with identical trees and subjects (preserving the tree
> which appears twice, since it has different subjects each time).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t9100-git-svn-basic.sh | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh
> index 9f2d19ecc4..3055943a22 100755
> --- a/t/t9100-git-svn-basic.sh
> +++ b/t/t9100-git-svn-basic.sh
> @@ -200,8 +200,9 @@ GIT_SVN_ID=alt
>  export GIT_SVN_ID
>  test_expect_success "$name" \
>      'git svn init "$svnrepo" && git svn fetch &&
> -     git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a &&
> -     git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b &&
> +     git log --format="tree %T %s" remotes/git-svn |
> +	awk "!seen[\$0]++ { print \$1, \$2 }" >a &&
> +     git log --format="tree %T" alt >b &&
>       test_cmp a b'
>  
>  name='check imported tree checksums expected tree checksums'

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] deterministic commit timestamps in tests
  2020-07-15 14:50           ` Junio C Hamano
@ 2020-07-15 15:04             ` Junio C Hamano
  2020-07-16 10:43               ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-07-15 15:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Wong, git, Han-Wen Nienhuys

Junio C Hamano <gitster@pobox.com> writes:

> Heh, it already is in 'next' and I do not think it is worth
> rewinding and rebuilding to cause downstream folks an additional
> trouble.  I however think this updated solution is nicer and do not
> mind building on top i.e. "while an earlier change did unbreak the
> svn tests, relying on the current timestamps is not nice and here is
> an update".
>
> Having said that, I think it is more urgent to address the "ouch, we
> made it clera that repos with extensions.worktreeconfig set without
> marking them as repoformat v1 are broken and without giving users
> enough hints to recover" issue discussed elsewhere before -rc1 (and
> for that reason I do not think I can tag -rc1 today), so I'd
> backburner it.  This topic won't merge down from 'next' until final
> anyway.

Before I forget, here is what could be applied on top of the "fix
the starting timestamp for the world" step as a "further polishing
on top of the completed series".

-- >8 --
From: Jeff King <peff@peff.net>
Date: Wed, 15 Jul 2020 03:42:50 -0400
Subject: [PATCH] t9100: stop depending on commit timestamps

An earlier "fix" to this script gave up updating it not to rely on
the current time because we cannot control what timestamp subversion
gives its commits.  We however could solve the issue in a different
way and still use deterministic timestamps on Git commits.

One fix would be to sort the list of trees before removing duplicates,
but that loses information:

  - we do care that the fetched history is in the same order

  - there's a tree which appears twice in the history, and we'd want to
    make sure that it's there both times

So instead, let's de-duplicate using a hash (preserving the order), and
drop only lines with identical trees and subjects (preserving the tree
which appears twice, since it has different subjects each time).

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t9100-git-svn-basic.sh | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh
index a89d338aa7..8dd9645ce5 100755
--- a/t/t9100-git-svn-basic.sh
+++ b/t/t9100-git-svn-basic.sh
@@ -8,10 +8,6 @@ GIT_SVN_LC_ALL=${LC_ALL:-$LANG}
 
 . ./lib-git-svn.sh
 
-# Make sure timestamps of commits created by Git interleave
-# with those created by "svn set-tree".
-unset GIT_COMMITTER_DATE
-
 case "$GIT_SVN_LC_ALL" in
 *.UTF-8)
 	test_set_prereq UTF8
@@ -204,8 +200,9 @@ GIT_SVN_ID=alt
 export GIT_SVN_ID
 test_expect_success "$name" \
     'git svn init "$svnrepo" && git svn fetch &&
-     git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a &&
-     git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b &&
+     git log --format="tree %T %s" remotes/git-svn |
+	awk "!seen[\$0]++ { print \$1, \$2 }" >a &&
+     git log --format="tree %T" alt >b &&
      test_cmp a b'
 
 name='check imported tree checksums expected tree checksums'
-- 
2.28.0-rc0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] deterministic commit timestamps in tests
  2020-07-15 15:04             ` Junio C Hamano
@ 2020-07-16 10:43               ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-07-16 10:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, git, Han-Wen Nienhuys

On Wed, Jul 15, 2020 at 08:04:52AM -0700, Junio C Hamano wrote:

> Before I forget, here is what could be applied on top of the "fix
> the starting timestamp for the world" step as a "further polishing
> on top of the completed series".
> 
> -- >8 --
> From: Jeff King <peff@peff.net>
> Date: Wed, 15 Jul 2020 03:42:50 -0400
> Subject: [PATCH] t9100: stop depending on commit timestamps

Thanks, this looks good and sets up to merge it post-v2.28.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-07-16 10:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09 20:33 [PATCH 0/4] deterministic commit timestamps in tests Jeff King
2020-07-09 20:34 ` [PATCH 1/4] t6000: use test_tick consistently Jeff King
2020-07-09 20:35 ` [PATCH 2/4] t9700: loosen ident timezone regex Jeff King
2020-07-09 20:39 ` [PATCH 3/4] t5539: make timestamp requirements more explicit Jeff King
2020-07-09 20:42 ` [PATCH 4/4] test-lib: set deterministic default author/committer date Jeff King
2020-07-10 22:02 ` [PATCH 0/4] deterministic commit timestamps in tests Junio C Hamano
2020-07-14 12:31   ` Jeff King
2020-07-14 12:33     ` Jeff King
2020-07-14 21:47       ` Eric Wong
2020-07-15  7:42         ` Jeff King
2020-07-15 14:50           ` Junio C Hamano
2020-07-15 15:04             ` Junio C Hamano
2020-07-16 10:43               ` Jeff King

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).