git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: esr@thyrsus.com, Derrick Stolee <stolee@gmail.com>, git@vger.kernel.org
Subject: Re: Finer timestamps and serialization in git
Date: Mon, 20 May 2019 01:15:47 +0200	[thread overview]
Message-ID: <86woimox24.fsf@gmail.com> (raw)
In-Reply-To: <87woiqvic4.fsf@evledraar.gmail.com> ("Ævar Arnfjörð Bjarmason"'s message of "Thu, 16 May 2019 11:50:35 +0200")

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> On Thu, May 16 2019, Eric S. Raymond wrote:
>> Derrick Stolee <stolee@gmail.com>:
>>> On 5/15/2019 3:16 PM, Eric S. Raymond wrote:
>>>> The deeper problem is that I want something from Git that I cannot
>>>> have with 1-second granularity. That is: a unique timestamp on each
>>>> commit in a repository.
>>>
>>> This is impossible in a distributed version control system like Git
>>> (where the commits are immutable). No matter your precision, there is
>>> a chance that two machines commit at the exact same moment on two different
>>> machines and then those commits are merged into the same branch.
>>
>> It's easy to work around that problem. Each git daemon has to single-thread
>> its handling of incoming commits at some level, because you need a lock on the
>> file system to guarantee consistent updates to it.

As far as I understand it this would slow down receiving new commits
tremendously.  Currently great care is taken to not have to parse the
commit object during fetch or push if it is not necessary (thanks to
things such as reachability bitmaps, see e.g. [1]).

With this restriction you would need to parse each commit to get at
commit timestamp and committer, check if the committer+timestamp is
unique, and bump it if it is not.

Also, bumping timestamp means that the commit changed, means that its
contents-based ID changed, means that all commits that follow it needs
to have its contents changed...  And now you need to rewrite many
commits.  And you also break the assumptions that the same commits have
the same contents (including date) and the same ID in different
repositories (some of which may include additional branches, some of
which may have been part of network of related repositories, etc.).

[1]: https://github.blog/2015-09-22-counting-objects/
     http://githubengineering.com/counting-objects/

> You don't need a daemon now to write commits to a repository. You can
> just add stuff to the object store, and then later flip the SHA-1 on a
> reference, we lock those indivdiual references, but this sort of thing
> would require a global write lock. This would introduce huge concurrency
> caveats that are non-issues now.
>
> Dumb clients matter. Now you can e.g. have two libgit2 processes writing
> to ref A and B respectively in the same repo, and they never have to
> know about each other or care about IPC.
>
> Also, even if you have daemons accepting pushes they can now be on
> different computers sharing things over e.g. an NFS filesystem. Now you
> need some FS-based serialization protcol for commits and their
> timestamps.

Also, performance matters.  Especially for large repositories, and for
large number of repositories.

>> So if a commit comes in that would be the same as the date of the
>> previous commit on the current branch, you bump the incoming commit timestamp.

You do realize that dates may not be monotonic (because of imperfections
in clock synchronization), thus the fact that the date is different from
parent does not mean that is different from ancestor.

>> That's the simple case. The complicated case is checking for date
>> collisions on *other* branches. But there are ways to make that fast,
>> too. There's a very obvious one involving a presort that is is O(log2
>> n) in the number of commits.

I don't think performance hit you would get would be acceptable.

[...]
>>>> Why do I want this? There are number of reasons, all related to a
>>>> mathematical concept called "total ordering".  At present, commits in
>>>> a Git repository only have partial ordering.
>>>
>>> This is true of any directed acyclic graph. If you want a total ordering
>>> that is completely unambiguous, then you should think about maintaining
>>> a linear commit history by requiring rebasing instead of merging.
>>
>> Excuse me, but your premise is incorrect.  A git DAG isn't just "any" DAG.
>> The presence of timestamps makes a total ordering possible.
>>
>> (I was a theoretical mathematician in a former life. This is all very
>> familiar ground to me.)

Maybe in theory, when all clock are synchronized.  But not in practice.
Shit happens.  Just recently Mike Hommey wrote about the case he has to
deal with:

MH> I'm hitting another corner case in some other "weird" history, where
MH> I have 500k commits all with the same date.

[2]: https://public-inbox.org/git/20190518005412.n45pj5p2rrtm2bfj@glandium.org/t/#u

--
Jakub Narębski

  reply	other threads:[~2019-05-19 23:16 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 19:16 Finer timestamps and serialization in git Eric S. Raymond
2019-05-15 20:16 ` Derrick Stolee
2019-05-15 20:28   ` Jason Pyeron
2019-05-15 21:14     ` Derrick Stolee
2019-05-15 22:07       ` Ævar Arnfjörð Bjarmason
2019-05-16  0:28       ` Eric S. Raymond
2019-05-16  1:25         ` Derrick Stolee
2019-05-20 15:05           ` Michal Suchánek
2019-05-20 16:36             ` Eric S. Raymond
2019-05-20 17:22               ` Derrick Stolee
2019-05-20 21:32                 ` Eric S. Raymond
2019-05-15 23:40     ` Eric S. Raymond
2019-05-19  0:16       ` Philip Oakley
2019-05-19  4:09         ` Eric S. Raymond
2019-05-19 10:07           ` Philip Oakley
2019-05-15 23:32   ` Eric S. Raymond
2019-05-16  1:14     ` Derrick Stolee
2019-05-16  9:50     ` Ævar Arnfjörð Bjarmason
2019-05-19 23:15       ` Jakub Narebski [this message]
2019-05-20  0:45         ` Eric S. Raymond
2019-05-20  9:43           ` Jakub Narebski
2019-05-20 10:08             ` Ævar Arnfjörð Bjarmason
2019-05-20 12:40             ` Jeff King
2019-05-20 14:14             ` Eric S. Raymond
2019-05-20 14:41               ` Michal Suchánek
2019-05-20 22:18                 ` Philip Oakley
2019-05-20 21:38               ` Elijah Newren
2019-05-20 23:12                 ` Eric S. Raymond
2019-05-21  0:08               ` Jakub Narebski
2019-05-21  1:05                 ` Eric S. Raymond
2019-05-15 20:20 ` Ævar Arnfjörð Bjarmason
2019-05-16  0:35   ` Eric S. Raymond
2019-05-16  4:14   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86woimox24.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=avarab@gmail.com \
    --cc=esr@thyrsus.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).