git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Eric S. Raymond" <esr@thyrsus.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: "Michal Suchánek" <msuchanek@suse.de>,
	"Jason Pyeron" <jpyeron@pdinc.us>,
	git@vger.kernel.org
Subject: Re: Finer timestamps and serialization in git
Date: Mon, 20 May 2019 17:32:03 -0400	[thread overview]
Message-ID: <20190520213203.GA110573@thyrsus.com> (raw)
In-Reply-To: <7e88805c-7e08-2631-599d-b47a098f1ce1@gmail.com>

Derrick Stolee <stolee@gmail.com>:
> What it sounds like you are doing is piping a 'git fast-import' process into
> reposurgeon, and testing that reposurgeon does the same thing every time.
> Of course this won't be consistent if 'git fast-import' isn't consistent.

It's not actually import that fails to have consistent behavior, it's export.

That is, if I fast-import a given stream, I get indistinguishable
in-core commit DAGs every time. (It would be pretty alarming if this
weren't true!)

What I have no guarantee of is the other direction.  In a multibranch repo,
fast-export writes out branches in an order I cannot predict and which
appears from the outside to be randomly variable.

> But what you should do instead is store a fixed file from one run of
> 'git fast-import' and send that file to reposurgeon for the repeated test.
> Don't rely on fast-import being consistent and instead use fixed input for
> your test.
> 
> If reposurgeon is providing the input to _and_ consuming the output from
> 'git fast-import', then yes you will need to have at least one integration
> test that runs the full pipeline. But for regression tests covering complicated
> logic in reposurgeon, you're better off splitting the test (or mocking out
> 'git fast-import' with something that provides consistent output given
> fixed input).

And I'd do that... but the problem is more fundamental than you seem to
understand.  git fast-export can't ship a consistent output order because
it doesn't retain metadata sufficient to totally order child branches.

This is why I wanted unique timestamps.  That would solve the problem,
branch child commits of any node would be ordered by their commit date.

But I had a realization just now.  A much smaller change would do it.
Suppose branch creations had creation stamps with a weak uniqueness property;
for any given parent node, the creation stamps of all branches originating
there are guaranteed to be unique?

If that were true, there would be an implied total ordering of the
repository.  The rules for writing out a totally ordered dump would go
like this:

1. At any given step there is a set of active branches and a cursor
on each such branch.  Each cursor points at a commit and caches the
creation stamp of the current branch.

2. Look at the set of commits under the cursors.  Write the oldest one.
If multiple commits have the same commit date, break ties by their
branch creation stamps.

3. Bump that cursor forward. If you're at a branch creation, it
becomes multiple cursors, one for each child branch.
If you're at a join, some cursors go away.

Here's the clever bit - you make the creation stamp nothing but a
counter that says "This was the Nth branch creation."  And it is
set by these rules:

4. If the branch creation stamp is undefined at branch creation time,
number it in any way you like as long as each stamp is unique. A
defined, documented order would be nice but is not necessary for
streams to round-trip.

5. When writing an export stream, you always utter a reset at the
point of branch creation.

6. When reading an import stream, the ordinal for a new branch is
defined as the number of resets you have seen.

Rules 5 and 6 together guarantee that branch creation ordinals round-trip
through export streams.  Thus, streams round-trip and I can have my
regression tests with no change to git's visible interface at all!

I could write this code.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



  reply	other threads:[~2019-05-20 21:32 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 19:16 Finer timestamps and serialization in git Eric S. Raymond
2019-05-15 20:16 ` Derrick Stolee
2019-05-15 20:28   ` Jason Pyeron
2019-05-15 21:14     ` Derrick Stolee
2019-05-15 22:07       ` Ævar Arnfjörð Bjarmason
2019-05-16  0:28       ` Eric S. Raymond
2019-05-16  1:25         ` Derrick Stolee
2019-05-20 15:05           ` Michal Suchánek
2019-05-20 16:36             ` Eric S. Raymond
2019-05-20 17:22               ` Derrick Stolee
2019-05-20 21:32                 ` Eric S. Raymond [this message]
2019-05-15 23:40     ` Eric S. Raymond
2019-05-19  0:16       ` Philip Oakley
2019-05-19  4:09         ` Eric S. Raymond
2019-05-19 10:07           ` Philip Oakley
2019-05-15 23:32   ` Eric S. Raymond
2019-05-16  1:14     ` Derrick Stolee
2019-05-16  9:50     ` Ævar Arnfjörð Bjarmason
2019-05-19 23:15       ` Jakub Narebski
2019-05-20  0:45         ` Eric S. Raymond
2019-05-20  9:43           ` Jakub Narebski
2019-05-20 10:08             ` Ævar Arnfjörð Bjarmason
2019-05-20 12:40             ` Jeff King
2019-05-20 14:14             ` Eric S. Raymond
2019-05-20 14:41               ` Michal Suchánek
2019-05-20 22:18                 ` Philip Oakley
2019-05-20 21:38               ` Elijah Newren
2019-05-20 23:12                 ` Eric S. Raymond
2019-05-21  0:08               ` Jakub Narebski
2019-05-21  1:05                 ` Eric S. Raymond
2019-05-15 20:20 ` Ævar Arnfjörð Bjarmason
2019-05-16  0:35   ` Eric S. Raymond
2019-05-16  4:14   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190520213203.GA110573@thyrsus.com \
    --to=esr@thyrsus.com \
    --cc=git@vger.kernel.org \
    --cc=jpyeron@pdinc.us \
    --cc=msuchanek@suse.de \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).