From: Derrick Stolee <stolee@gmail.com>
To: esr@thyrsus.com
Cc: git@vger.kernel.org
Subject: Re: Finer timestamps and serialization in git
Date: Wed, 15 May 2019 21:14:30 -0400 [thread overview]
Message-ID: <ab3222ab-9121-9534-1472-fac790bf08a4@gmail.com> (raw)
In-Reply-To: <20190515233230.GA124956@thyrsus.com>
On 5/15/2019 7:32 PM, Eric S. Raymond wrote:
> Derrick Stolee <stolee@gmail.com>:
>> On 5/15/2019 3:16 PM, Eric S. Raymond wrote:
>>> The deeper problem is that I want something from Git that I cannot
>>> have with 1-second granularity. That is: a unique timestamp on each
>>> commit in a repository.
>>
>> This is impossible in a distributed version control system like Git
>> (where the commits are immutable). No matter your precision, there is
>> a chance that two machiens commit at the exact same moment on two different
>> machines and then those commits are merged into the same branch.
>
> It's easy to work around that problem. Each git daemon has to single-thread
> its handling of incoming commits at some level, because you need a lock on the
> file system to guarantee consistent updates to it.
>
> So if a commit comes in that would be the same as the date of the
> previous commit on the current branch, you bump the incoming commit timestamp.
This changes the commit, causing it to have a different object id, and
now the client that pushed that commit disagrees with your machine on
the history.
> That's the simple case. The complicated case is checking for date
> collisions on *other* branches. But there are ways to make that fast,
> too. There's a very obvious one involving a presort that is is O(log2
> n) in the number of commits.
>
> I wouldn't have brought this up in the first place if I didn't have a
> pretty clear idea how to do it in code!
>
>> Even when you specify a committer, there are many environments where a set
>> of parallel machines are creating commits with the same identity.
>
> If those commit sets become the same commit in the final graph, this is
> not a problem for total ordering.
>
>>> Why do I want this? There are number of reasons, all related to a
>>> mathematical concept called "total ordering". At present, commits in
>>> a Git repository only have partial ordering.
>>
>> This is true of any directed acyclic graph. If you want a total ordering
>> that is completely unambiguous, then you should think about maintaining
>> a linear commit history by requiring rebasing instead of merging.
>
> Excuse me, but your premise is incorrect. A git DAG isn't just "any" DAG.
> The presence of timestamps makes a total ordering possible.
>
> (I was a theoretical mathematician in a former life. This is all very
> familiar ground to me.)
Same. But you seem to have a fundamental misunderstanding about the immutability
of commits, which is core to how Git works. If you change a commit, then you
get a new object id and now distributed copies don't agree on the history.
>>> One consequence is that
>>> action stamps - the committer/date pairs I use as VCS-independent commit
>>> identifications in reposurgeon - are not unique. When a patch sequence
>>> is applied, it can easily happen fast enough to give several successive
>>> commits the same committer-ID and timestamp.
>>
>> Sorting by committer/date pairs sounds like an unhelpful idea, as that
>> does not take any graph topology into account. It happens that commits
>> can actually have an _earlier_ commit date than its parent.
>
> Yes, I'm aware of that. The uniqueness properties that make a total
> ordering desirable are not actually dependent on timestamp order
> coinciding with topo order.
>
>> Changing the granularity of timestamps requires changing the commit format,
>> which is probably a non-starter.
>
> That's why I started by noting that you're going to have to break the
> format anyway to move to an ECDSA hash (or whatever you end up using).
>
> I'm saying that *since you'll need to do that anyway*, it's a good time
> to think about making timestamps finer-grained and unique.
That change is difficult enough as it is. I don't think your goals justify
making this more complicated. You are also not considering:
* The in-memory data type now needs to be a floating-point type, or an
even larger integer type using a different set of units.
* This data type now affects our priority queues for commit walks, how
we store the commit date in the commit-graph file, how we compute
relative dates for 'git log' pretty formats.
-Stolee
next prev parent reply other threads:[~2019-05-16 1:49 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-15 19:16 Finer timestamps and serialization in git Eric S. Raymond
2019-05-15 20:16 ` Derrick Stolee
2019-05-15 20:28 ` Jason Pyeron
2019-05-15 21:14 ` Derrick Stolee
2019-05-15 22:07 ` Ævar Arnfjörð Bjarmason
2019-05-16 0:28 ` Eric S. Raymond
2019-05-16 1:25 ` Derrick Stolee
2019-05-20 15:05 ` Michal Suchánek
2019-05-20 16:36 ` Eric S. Raymond
2019-05-20 17:22 ` Derrick Stolee
2019-05-20 21:32 ` Eric S. Raymond
2019-05-15 23:40 ` Eric S. Raymond
2019-05-19 0:16 ` Philip Oakley
2019-05-19 4:09 ` Eric S. Raymond
2019-05-19 10:07 ` Philip Oakley
2019-05-15 23:32 ` Eric S. Raymond
2019-05-16 1:14 ` Derrick Stolee [this message]
2019-05-16 9:50 ` Ævar Arnfjörð Bjarmason
2019-05-19 23:15 ` Jakub Narebski
2019-05-20 0:45 ` Eric S. Raymond
2019-05-20 9:43 ` Jakub Narebski
2019-05-20 10:08 ` Ævar Arnfjörð Bjarmason
2019-05-20 12:40 ` Jeff King
2019-05-20 14:14 ` Eric S. Raymond
2019-05-20 14:41 ` Michal Suchánek
2019-05-20 22:18 ` Philip Oakley
2019-05-20 21:38 ` Elijah Newren
2019-05-20 23:12 ` Eric S. Raymond
2019-05-21 0:08 ` Jakub Narebski
2019-05-21 1:05 ` Eric S. Raymond
2019-05-15 20:20 ` Ævar Arnfjörð Bjarmason
2019-05-16 0:35 ` Eric S. Raymond
2019-05-16 4:14 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab3222ab-9121-9534-1472-fac790bf08a4@gmail.com \
--to=stolee@gmail.com \
--cc=esr@thyrsus.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).