git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Abhishek Kumar <abhishekkumar8222@gmail.com>,
	git@vger.kernel.org,
	Christian Couder <christian.couder@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: [RFC][GSoC] Implement Generation Number v2
Date: Mon, 23 Mar 2020 14:43:37 +0100	[thread overview]
Message-ID: <86mu87qj92.fsf@gmail.com> (raw)
In-Reply-To: <xmqq369z7i1b.fsf@gitster.c.googlers.com> (Junio C. Hamano's message of "Sun, 22 Mar 2020 22:32:32 -0700")

Junio C Hamano <gitster@pobox.com> writes:
> Abhishek Kumar <abhishekkumar8222@gmail.com> writes:
>> Jakub Narębski <jnareb@gmail.com> writes:
[...]
>>> Unfortunately for the time being we cannot use commit-graph format
>>> version; the idea that was proposed on the mailing list (when we found
>>> about the bug in handling commit-graph versioning, during incremental
>>> commit-graph implementation), was to create and use metadata chunk or
>>> versioning chunk (the final version of incremental format do not use
>>> this mechanism).  This could be used by gen2 compatibile Git to
>>> distinguish between situation where old commit-graph file to be updated
>>> uses generation number v1, and when it uses v2.
>>> 
>>> If you have a better idea, please say so.
>>
>> We could also use a flag file. Here's how it works:
>>
>> If the file `.git/info/generation-number-v2` exists, use gen2.
>> Otherwise use gen1.
>
> If the file is lost then we will try to read the other file that has
> the commit-graph data as if it were in old format?  And if such a
> file was created (say, with "touch .git/info/generation-number-v2"),
> a file in the original format will be read as if it is in new
> format?  If that is the case, it is likely that we'd see a segfault;
> sounds too brittle to me.
>
> It appears that the format of "CDAT", and the fact that generation
> is represented as higher 30-bit of a be32 integer, is very much
> hardcoded in the design and is hard to change, but your new version
> of graph file can be designed not to use "CDAT" chunk at all, and
> instead have the commit data with new version of generation numbers
> stored in a different chunk (say "CDA2") to force older version of
> Git not to use the new graph file---would that work?

It looks like there are a few possible ways of handling introduction of
generation numbers v2.  Let's consider them one by one.

The problem we need to solve is co-existence of old Git (that does not
understand v2, and that hard fails on commit-graph format version bump),
and new Git (that understands and writes v2, and that I assume soft
fails that is it simply doesn't use commit-graph if it of unknown
version).


If the commit-graph file was written by new Git, and includes generation
numbers v2, we want old Git to at least do not crash, possibly do not
use commit-graph, best if it can use commit-graph in suboptimal way.  We
also need to handle old Git trying to update (in incremental or
non-incremental way) the commit-graph file.

If the commit-graph file was written by old Git, and includes generation
nmbers v1 (topological levels), we want new Git to recognize this and at
best use those old generation numbers in a correct way.  We want new Git
to be able to update commit-graph file (in incremental or
non-incremental way).

Did I miss anything?


Proposed solutions are:
 - metadata / versioning chunk,
 - flag file: `.git/info/generation-number-v2`,
 - new chunk for commit data: "CDA2".

I would like to propose yet another solution: putting generation number
v2 data in a separate chunk (and possibly keeping generation number v1
in CDAT commit data chunk).  In this case we could even use ordinary
corrected commit date as generation number v2 (storing offsets as 32-bit
unsigned values), instead of backward-compatibile corrected commit date
with monotonic offsets.

Each solution has its advantages and disadvantages.


With the flag file, the problem is (as Junio noticed) that if file gets
accidentally deleted, new Git would think incorrectly that commit-graph
uses generation number v1... which while suboptimal should not be bad
thanks to backward compatibility.  But I think the flag file should have
some kind of checksum as its contents (perhaps simply a copy of
commit-graph file checksum, or one checksum per file in chain with
incremental commit-graph), so that it old Git rewrites commit-graph file
leaving flag file present, new Git would notice this.

Metadata or versioning chunk cannot be deleted by mistake; if old Git
copies unknown chunks to new updated commit-graph file instead of
skipping them we would need to add some kind of checksum (similarly to
the case for flag file).  The problem to be solved is what to do if some
files in the chain of commit-graph files have v2 (and this chunk), and
some have v1 generation number (and do not have this chunk).

About moving commit data with generation number v2 to "CDA2" chunk: if
"CDAT" chunk is missing then (I think) old Git would simply not use
commit-graph file at all; it may crash, but I don't think so.  If "CDAT"
chunk has zero length... I don't know what would happen then, possibly
also old Git would simply not use commit-graph data at all.

Putting generation number v2 into separate chunk (which might be called
"GEN2" or "OFFS"/"DOFF") has the disadvantage of increasing the on disk
size of the commit graph, and possibly also increasing memory
consumption (the latter depends on how it would be handled), but has the
advantage of being fullly backward compatibile.  Old Git would simply
use generation numbers v1 in "CDAT", new Git would use generation
numbers v2 in "OFFS" -- combining commit creation date from "CDAT" and
offset from "OFFS"), and there should be no problems with updating
commit-graph file (either rewriting, or adding new commit-graph to the
chain).

I think that's all.

Best,
-- 
Jakub Narębski

  parent reply	other threads:[~2020-03-23 13:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-22  9:35 [RFC][GSoC] Implement Generation Number v2 Abhishek Kumar
2020-03-22 20:05 ` Jakub Narebski
2020-03-23  4:25   ` Abhishek Kumar
2020-03-23  5:32     ` Junio C Hamano
2020-03-23 11:32       ` Abhishek Kumar
2020-03-23 13:43       ` Jakub Narebski [this message]
2020-03-23 15:54         ` Derrick Stolee
2020-03-24  9:24           ` Jakub Narebski
2020-03-23 16:04         ` Junio C Hamano
2020-03-24 15:44           ` Jakub Narebski
2020-03-24 21:13             ` Junio C Hamano
2020-03-26 10:15         ` [GSoC][Proposal v2] " Abhishek Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86mu87qj92.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).