git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Eric S. Raymond" <esr@thyrsus.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	git@vger.kernel.org
Subject: Re: Finer timestamps and serialization in git
Date: Sun, 19 May 2019 20:45:59 -0400	[thread overview]
Message-ID: <20190520004559.GA41412@thyrsus.com> (raw)
In-Reply-To: <86woimox24.fsf@gmail.com>

Jakub Narebski <jnareb@gmail.com>:
> As far as I understand it this would slow down receiving new commits
> tremendously.  Currently great care is taken to not have to parse the
> commit object during fetch or push if it is not necessary (thanks to
> things such as reachability bitmaps, see e.g. [1]).
> 
> With this restriction you would need to parse each commit to get at
> commit timestamp and committer, check if the committer+timestamp is
> unique, and bump it if it is not.

So, I'd want to measure that rather than simply assuming it's a blocker.
Clocks are very cheap these days.

> Also, bumping timestamp means that the commit changed, means that its
> contents-based ID changed, means that all commits that follow it needs
> to have its contents changed...  And now you need to rewrite many
> commits.

What "commits that follow it?" By hypothesis, the incoming commit's
timestamp is bumped (if it's bumped) when it's first added to a branch
or branches, before there are following commits in the DAG.

>    And you also break the assumptions that the same commits have
> the same contents (including date) and the same ID in different
> repositories (some of which may include additional branches, some of
> which may have been part of network of related repositories, etc.).

Wait...unless I completely misunderstand the hash-chain model, doesn't the
hash of a commit depend on the hashes of its parents?  If that's the case,
commits cannot have portable hashes. If it's not, please correct me.

But if it's not, how does your first objection make sense?

> > You don't need a daemon now to write commits to a repository. You can
> > just add stuff to the object store, and then later flip the SHA-1 on a
> > reference, we lock those indivdiual references, but this sort of thing
> > would require a global write lock. This would introduce huge concurrency
> > caveats that are non-issues now.
> >
> > Dumb clients matter. Now you can e.g. have two libgit2 processes writing
> > to ref A and B respectively in the same repo, and they never have to
> > know about each other or care about IPC.

How do they know they're not writing to the same ref?  What keeps
*that* operation atomic?

> You do realize that dates may not be monotonic (because of imperfections
> in clock synchronization), thus the fact that the date is different from
> parent does not mean that is different from ancestor.

Good point. That means the O(log2 n) version of the check has to be done
all the time.  Unfortunate.

> >> That's the simple case. The complicated case is checking for date
> >> collisions on *other* branches. But there are ways to make that fast,
> >> too. There's a very obvious one involving a presort that is is O(log2
> >> n) in the number of commits.
> 
> I don't think performance hit you would get would be acceptable.

Again, it's bad practice to assume rather than measure. Human intuitions
about this sort of thing are notoriously unreliable.

> >> Excuse me, but your premise is incorrect.  A git DAG isn't just "any" DAG.
> >> The presence of timestamps makes a total ordering possible.
> >>
> >> (I was a theoretical mathematician in a former life. This is all very
> >> familiar ground to me.)
> 
> Maybe in theory, when all clock are synchronized.

My assertion does not depend on synchronized clocks, because it doesn't have to.

If the timestamps in your repo are unique, there *is* a total ordering - 
by timestamp. What you don't get is guaranteed consistency with the
topo ordering - that is you get no guarantee that a child's timestamp
is greater than its parents'. That really would require a common
timebase.

But I don't need that stronger property, because the purpose of
totally ordering the repo is to guararantee the uniqueness of action
stamps.  For that, all I need is to be able to generate a unique cookie
for each commit that can be inserted in its action stamp.  For my use cases
that cookie should *not* be a hash, because hashes always break N years
down.  It should be an eternally stable product of the commit metadata.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



  reply	other threads:[~2019-05-20  0:46 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 19:16 Finer timestamps and serialization in git Eric S. Raymond
2019-05-15 20:16 ` Derrick Stolee
2019-05-15 20:28   ` Jason Pyeron
2019-05-15 21:14     ` Derrick Stolee
2019-05-15 22:07       ` Ævar Arnfjörð Bjarmason
2019-05-16  0:28       ` Eric S. Raymond
2019-05-16  1:25         ` Derrick Stolee
2019-05-20 15:05           ` Michal Suchánek
2019-05-20 16:36             ` Eric S. Raymond
2019-05-20 17:22               ` Derrick Stolee
2019-05-20 21:32                 ` Eric S. Raymond
2019-05-15 23:40     ` Eric S. Raymond
2019-05-19  0:16       ` Philip Oakley
2019-05-19  4:09         ` Eric S. Raymond
2019-05-19 10:07           ` Philip Oakley
2019-05-15 23:32   ` Eric S. Raymond
2019-05-16  1:14     ` Derrick Stolee
2019-05-16  9:50     ` Ævar Arnfjörð Bjarmason
2019-05-19 23:15       ` Jakub Narebski
2019-05-20  0:45         ` Eric S. Raymond [this message]
2019-05-20  9:43           ` Jakub Narebski
2019-05-20 10:08             ` Ævar Arnfjörð Bjarmason
2019-05-20 12:40             ` Jeff King
2019-05-20 14:14             ` Eric S. Raymond
2019-05-20 14:41               ` Michal Suchánek
2019-05-20 22:18                 ` Philip Oakley
2019-05-20 21:38               ` Elijah Newren
2019-05-20 23:12                 ` Eric S. Raymond
2019-05-21  0:08               ` Jakub Narebski
2019-05-21  1:05                 ` Eric S. Raymond
2019-05-15 20:20 ` Ævar Arnfjörð Bjarmason
2019-05-16  0:35   ` Eric S. Raymond
2019-05-16  4:14   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190520004559.GA41412@thyrsus.com \
    --to=esr@thyrsus.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).