git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Eric S. Raymond" <esr@thyrsus.com>
To: Elijah Newren <newren@gmail.com>
Cc: "Jakub Narebski" <jnareb@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: Finer timestamps and serialization in git
Date: Mon, 20 May 2019 19:12:23 -0400	[thread overview]
Message-ID: <20190520231223.GA117962@thyrsus.com> (raw)
In-Reply-To: <CABPp-BHK1N2zZoeBeSgnh12LPqLgZxfbL0DzALj28y97_Q-ahg@mail.gmail.com>

Elijah Newren <newren@gmail.com>:
> Hi,
> 
> On Mon, May 20, 2019 at 11:09 AM Eric S. Raymond <esr@thyrsus.com> wrote:
> 
> > > For cookie to be unique among all forks / clones of the same repository
> > > you need either centralized naming server, or for the cookie to be based
> > > on contents of the commit (i.e. be a hash function).
> >
> > I don't need uniquess across all forks, only uniqueness *within the repo*.
> 
> You've lost me.  In other places you stated you didn't want to use the
> commit hash, and now you say this.  If you only care about uniqueness
> within the current copy of the repo and don't care about uniqueness
> across forks (i.e. clones or copies that exist now or in the future --
> including copies stored using SHA256), then what's wrong with using
> the commit hash?

Because it's not self-describing, can't be computed solely from visible
commit metadata, and relies on complex external assumptions about how
the hash is computed which break when your VCS changes hash algorithms.

These are dealbreakers because one of my major objectives is forward
portability of these IDs forever. And I mean *forever*.  It should be
possible for someone in the year 40,000, in between assaulting planets
for the God-Emperor, to look at an import stream and deduce how to
resolve the cookies to their commits without seeing git's code or
knowing anything about its hash algorithms.

I think maybe the reason I'm having so much trouble getting this
across is that git insiders are used to thinking of import streams as
transient things.  Because I do a lot of repo migrations, I have a
very different view of them.  I built reposurgeon on the realization
that they're a general transport format for revision histories, and
that has forward value independent of the existence of git.

If a stream contained fully forward-portable action stamps, it would be
forward-portable forever.  Hashes in commit comments are the *only*
blocker to that.  Take this from a person who has spent way too much time
patching Subversion IDs like r1234 during repository conversions.

It would take so little to make this work. Existing stream format is
*almost there*.

> A stable ordering of commits in a fast-export stream might be a cool
> feature.  But I don't know how to define one, other than perhaps sort
> first by commit-depth (maybe optionally adding a few additional
> intermediate sorting criteria), and then finally sort by commit hash
> as a tiebreaker. Without the fallback to commit hash, you fall back
> on normal traversal order which isn't stable (it depends on e.g. order
> of branches listed on the command line to fast-export, or if using
> --all, what new branch you just added that comes alphabetically before
> others).
>
> I suspect that solution might run afoul of your dislike for commit
> hashes, though, so I'm not sure it'd work for you.

It does. See above.

> > So let me back up a step.  I will cheerfully drop advocating bumping
> > timestamps if anyone can tell me how a different way to define a per-commit
> > reference cookie that (a) is unique within its repo, and (b) only requires
> > metadata visible in the fast-export representation of the commit.
> 
> Does passing --show-original-ids option to fast-export and using the
> resulting original-oid field as the cookie count?

I was not aware of this option.  Looking...no wonder, it's not on my
system man page.  Must be recent.

OK. Wow.  That is *useful*, and I am going to upgrade reposurgeon to read
it.  With that I can do automatic commit-reference rewriting.

I don't consider it a complete solution. The problem is that OID is
a consistent property that can be used to resolve cookies, but there's
no guaranteed that it's a *preserved* property that survives multiple
round trips and changes in hash functions.

So the right way to use it is to pick it up, do reference-cookie
resolution, and then mung the reference cookies to a format that is
stable forever.  I don't know what that format should be yet.  I
have a message in composition about this.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



  reply	other threads:[~2019-05-20 23:12 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 19:16 Finer timestamps and serialization in git Eric S. Raymond
2019-05-15 20:16 ` Derrick Stolee
2019-05-15 20:28   ` Jason Pyeron
2019-05-15 21:14     ` Derrick Stolee
2019-05-15 22:07       ` Ævar Arnfjörð Bjarmason
2019-05-16  0:28       ` Eric S. Raymond
2019-05-16  1:25         ` Derrick Stolee
2019-05-20 15:05           ` Michal Suchánek
2019-05-20 16:36             ` Eric S. Raymond
2019-05-20 17:22               ` Derrick Stolee
2019-05-20 21:32                 ` Eric S. Raymond
2019-05-15 23:40     ` Eric S. Raymond
2019-05-19  0:16       ` Philip Oakley
2019-05-19  4:09         ` Eric S. Raymond
2019-05-19 10:07           ` Philip Oakley
2019-05-15 23:32   ` Eric S. Raymond
2019-05-16  1:14     ` Derrick Stolee
2019-05-16  9:50     ` Ævar Arnfjörð Bjarmason
2019-05-19 23:15       ` Jakub Narebski
2019-05-20  0:45         ` Eric S. Raymond
2019-05-20  9:43           ` Jakub Narebski
2019-05-20 10:08             ` Ævar Arnfjörð Bjarmason
2019-05-20 12:40             ` Jeff King
2019-05-20 14:14             ` Eric S. Raymond
2019-05-20 14:41               ` Michal Suchánek
2019-05-20 22:18                 ` Philip Oakley
2019-05-20 21:38               ` Elijah Newren
2019-05-20 23:12                 ` Eric S. Raymond [this message]
2019-05-21  0:08               ` Jakub Narebski
2019-05-21  1:05                 ` Eric S. Raymond
2019-05-15 20:20 ` Ævar Arnfjörð Bjarmason
2019-05-16  0:35   ` Eric S. Raymond
2019-05-16  4:14   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190520231223.GA117962@thyrsus.com \
    --to=esr@thyrsus.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).