git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: "Mike Hommey" <mh@glandium.org>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	git@vger.kernel.org
Subject: Re: Revision walking, commit dates, slop
Date: Mon, 20 May 2019 15:42:23 +0200	[thread overview]
Message-ID: <86ftp9p7i8.fsf@gmail.com> (raw)
In-Reply-To: <cfa2c367-5cd7-add5-0293-caa75b103f34@gmail.com> (Derrick Stolee's message of "Mon, 20 May 2019 07:20:01 -0400")

Derrick Stolee <stolee@gmail.com> writes:
> On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>>
>> Are there any blockers that prevent the switch to this
>> "generation number v2"?
>> 
>> - Is it a problem with insufficient data to choose the correct numbering
>>   as "generation number v2' (there can be only one)?
>> - Is it a problem with selected "generation number v2" being
>>   incompatibile with gen v2, and Git failing when new version of
>>   commit-graph is used instead of softly just not using commit-graph?
>> - Or is it something else?

Thanks for the explanation.

> The switch becomes a bit complicated.
>
> First, the plan was to version the commit-graph file to v2, and that would
> include a byte in the header for the "reachability index version" [1]. Since
> older clients fail hard on a newer file version, we are switching to instead
> including the reachability index version as a value in a more flexible 
> "metadata chunk" [2].

Ugh, that is bad.  The version number inn the commit-graph file format
was supposed to make it possible to easily change the format; now it
looks like we are stuck with workarounds until the released version that
dies on never commit-graph file format version innstead of softly not
utilizing the commit-graph file dies off.

How this issue got missed in review...

If we cannot change the format, all that is left is ading new chunks,
and changes that conform to commit-graph file version 1.  

>                      Using the generation number column for the corrected
> commit-date offsets (assuming we also guarantee the offset is strictly
> increasing from parent to child), these new values will be backwards-
> compatible _except_ for 'git commit-graph verify'.

O.K., so the "generation number v2 (legacy)" would be incremental and
backward-compatibile in use (though not in generation and validation).

Do I understand it correctly how it is calculated:

  corrected_date(C) = max(committer_date(C),
                          max_{P ∈ parents(C)}(corrected_date(P)) + 1)
  offset(C) = corrected_date(C) - committer_date(C)
  gen_v2(C) = max(offset(C), max_{P ∈ parents(C)}(gen_v2(P)) + 1) 

Do you have benchmark for this "monotonically offset corrected commit
date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
and https://github.com/derrickstolee/gen-test ?


Also, what would happen if different versions of Git tried to add to
commit-graph in interleaved way, either with rewrite or incremental?

> Second, we need to pull the reachability index value into a commit slab.

Is commit slab documented somewhere in Documentation/technical/, or just
in comments in commit-slab.h?

As I understand it, commit slab is Git-specific implementition of
inside-out object storage for commit data (i.e. struct of arrays instead
of array of structs), isn't it?  I wonder if using commit slab improves
cache utilization...

> The generation value is currently 32 bits, but we will expand that to
> 64 as it stores a timestamp. The commit struct is a bit bloated already,
> so this will reduce the required memory space even when not using the
> commit-graph. But, it requires some refactoring, including every place
> where we pass a "min_generation" needs to change type and name.

Could this be done with Coccinelle's spatch, similar to
e.g. contrib/coccinelle/commit.cocci?

>
> Third and finally, we need to calculate the new values and change the
> read logic to sum the offset and commit-date (when the metadata chunk
> says we are using corrected commit date).

Right.

> While none of this is _incredibly_ hard to do, it does require a bit
> of care. It's on my list to get to at some point, but making the file
> incremental is higher priority to me.

All right, I can understand that.

> [1] https://public-inbox.org/git/pull.112.git.gitgitgadget@gmail.com/
> [2] https://public-inbox.org/git/87h8acivkh.fsf@evledraar.gmail.com/

Best,
--
Jakub Narębski

  reply	other threads:[~2019-05-20 13:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-18  0:54 Revision walking, commit dates, slop Mike Hommey
2019-05-18  1:50 ` SZEDER Gábor
2019-05-18  3:58   ` Mike Hommey
2019-05-18  4:17     ` Mike Hommey
2019-05-18 12:01       ` SZEDER Gábor
2019-05-19 22:28         ` Jakub Narebski
2019-05-20  1:33       ` Derrick Stolee
2019-05-20 11:02         ` Jakub Narebski
2019-05-20 11:20           ` Derrick Stolee
2019-05-20 13:42             ` Jakub Narebski [this message]
2019-05-20 23:27               ` Jakub Narebski
2019-05-21  1:20                 ` Derrick Stolee
2019-05-22 18:29                   ` Jakub Narebski
2019-05-22 19:06                     ` Derrick Stolee
2019-05-23 21:04                       ` Jakub Narebski
2019-06-25  7:51               ` Jakub Narebski
2019-06-25 10:54                 ` Derrick Stolee
2019-09-18  8:43                   ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Jakub Narebski
2019-09-18 12:26                     ` Derrick Stolee
2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
2019-05-21 13:59           ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
2019-05-22  2:19             ` Mike Hommey
2019-05-21  2:00   ` Revision walking, commit dates, slop Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86ftp9p7i8.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mh@glandium.org \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).