git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Eric Wong <e@80x24.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Han-Wen Nienhuys <hanwen@google.com>
Subject: Re: [PATCH 0/4] deterministic commit timestamps in tests
Date: Wed, 15 Jul 2020 03:42:50 -0400	[thread overview]
Message-ID: <20200715074250.GB3249056@coredump.intra.peff.net> (raw)
In-Reply-To: <20200714214728.GA21660@dcvr>

On Tue, Jul 14, 2020 at 09:47:28PM +0000, Eric Wong wrote:

> > -     git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a &&
> > -     git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b &&
> > +     git log --format="tree %T %s" remotes/git-svn |
> > +	perl -lne "print unless \$seen{\$_}++" |
> > +	cut -d" " -f1-2 >a &&
> > +     git log --format="tree %T" remotes/alt >b &&
> >       test_cmp a b'
> 
> The future of non-strict one-liners with Perl7 on the horizon
> seems uncertain :<   cut is unnecessary either way, but
> I suggest awk, here:
> 
> 	awk "!seen[\$0]++ { print \$1, \$2 }'

Thanks, that is nicer.

> >  name='check imported tree checksums expected tree checksums'
> > 
> > It does lose a little bit of information, which is that in the original
> > we confirmed that the duplicates were always next to each other. But I'm
> > not sure that's important. We'd get confused if the same subject
> > appeared twice, but all of the commits have distinct hard-coded
> > subjects in the earlier tests.
> 
> Yeah, but I think it's fine.  It's been a while since I wrote
> this

OK. If you're on board, then I think doing it this way is slightly
nicer, as it's less likely to be confusing or bite somebody in the
future.

Here's a revised patch (I see Junio already picked up the other fix; if
that ends up being merged instead, that's not the end of the world).

(compared to the earlier version, you can skip everything in the commit
message before "One fix would be...").

-- >8 --
Subject: [PATCH] t9100: stop depending on commit timestamps

The early part of t9100 creates an unusual "doubled" history in the
"git-svn" ref. When we get to t9100.17, it looks like this:

  $ git log --oneline --graph git-svn
  [...]
  *   efd0303 detect node change from file to directory #2
  |\
  * | 3e727c0 detect node change from file to directory #2
  |/
  *   3b00468 try a deep --rmdir with a commit
  |\
  * | b4832d8 try a deep --rmdir with a commit
  |/
  * f0d7bd5 import for git svn

Each commit we make with "git commit" is paired with one from "git svn
set-tree", with the latter as a merge of the first and its grandparent.

Later, t9100.17 wants to check that "git svn fetch" gets the same trees.
And it does, but just one copy of each. So it uses rev-list to get the
tree of each commit and pipes it to "uniq" to drop the duplicates. Our
input isn't sorted, but it will find adjacent duplicates. This works
reliably because the order of commits from rev-list always shows the
duplicates next to each other. For any one of those merges, we could
choose to show its duplicate or the grandparent first. But barring
clocks running backwards, the duplicate will always have a time equal to
or greater than the grandparent. Even if equal, we break ties by showing
the first-parent first, so the duplicates remain adjacent.

But this would break if the timestamps stopped moving in chronological
order. Normally we would rely on test_tick for this, but we have _two_
sources of time here:

  - "git commit" creates one commit based on GIT_COMMITTER_DATE (which
    respects test_tick)

  - the "svn set-tree" one is based on subversion, which does not have
    an easy way to specify a timestamp

So using test_tick actually breaks the test, because now the duplicates
are far in the past, and we'll show the grandparent before the
duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in
all scripts will break it.

One fix would be to sort the list of trees before removing duplicates,
but that loses information:

  - we do care that the fetched history is in the same order

  - there's a tree which appears twice in the history, and we'd want to
    make sure that it's there both times

So instead, let's de-duplicate using a hash (preserving the order), and
drop only lines with identical trees and subjects (preserving the tree
which appears twice, since it has different subjects each time).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t9100-git-svn-basic.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh
index 9f2d19ecc4..3055943a22 100755
--- a/t/t9100-git-svn-basic.sh
+++ b/t/t9100-git-svn-basic.sh
@@ -200,8 +200,9 @@ GIT_SVN_ID=alt
 export GIT_SVN_ID
 test_expect_success "$name" \
     'git svn init "$svnrepo" && git svn fetch &&
-     git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a &&
-     git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b &&
+     git log --format="tree %T %s" remotes/git-svn |
+	awk "!seen[\$0]++ { print \$1, \$2 }" >a &&
+     git log --format="tree %T" alt >b &&
      test_cmp a b'
 
 name='check imported tree checksums expected tree checksums'
-- 
2.28.0.rc0.394.ga62ae196ad


  reply	other threads:[~2020-07-15  7:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-09 20:33 [PATCH 0/4] deterministic commit timestamps in tests Jeff King
2020-07-09 20:34 ` [PATCH 1/4] t6000: use test_tick consistently Jeff King
2020-07-09 20:35 ` [PATCH 2/4] t9700: loosen ident timezone regex Jeff King
2020-07-09 20:39 ` [PATCH 3/4] t5539: make timestamp requirements more explicit Jeff King
2020-07-09 20:42 ` [PATCH 4/4] test-lib: set deterministic default author/committer date Jeff King
2020-07-10 22:02 ` [PATCH 0/4] deterministic commit timestamps in tests Junio C Hamano
2020-07-14 12:31   ` Jeff King
2020-07-14 12:33     ` Jeff King
2020-07-14 21:47       ` Eric Wong
2020-07-15  7:42         ` Jeff King [this message]
2020-07-15 14:50           ` Junio C Hamano
2020-07-15 15:04             ` Junio C Hamano
2020-07-16 10:43               ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200715074250.GB3249056@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanwen@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).