git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	David Borowitz <dborowitz@google.com>,
	Stefan Beller <sbeller@google.com>
Subject: Re: reftable [v6]: new ref storage format
Date: Tue, 15 Aug 2017 15:47:56 -0700	[thread overview]
Message-ID: <CAJo=hJum2boTfcXOaZVhxmbGB9Dygoc5=TM8RD2nqxo-Ahjv9g@mail.gmail.com> (raw)

On Mon, Aug 14, 2017 at 5:13 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 08/07/2017 03:47 AM, Shawn Pearce wrote:
>> 6th iteration of the reftable storage format.
>
> Thanks!
>
>> Changes from v5:
>> - extensions.refStorage = reftable is used to select this format.
>>
>> - Log records can be explicitly deleted (for refs/stash).
>> - Log records may use Michael Haggerty's chained idea to compress before zlib.
>>   This saved ~5.8% on one of my example repositories.
>
> Meh. Do you think that's worth the complexity? The percentage savings
> will presumably be even lower for repositories that store significant
> information in their reflog messages.

No, I don't. I'm quite happy to remove the chained compression. I'll
keep the explicit deletion support for refs/stash.


>> [...]
>> #### ref record
>> - `0x3`: symref and text: `varint( text_len ) text`
[...]
> I'm still relatively negative on storing "other" references (except
> `HEAD`) in reftable. Here are my thoughts:
>
> * "Other" references are not considered for reachability, so there
>   is no need for their modification to be done atomically.
>
> * "Other" references don't have or need reflogs.
>
> * The refs API would have to provide a way for other Git code to
>   read and write "other" references including their extra
>   information, and the users of that information would have to
>   be rewritten to use the new API.
>
> * Presumably there are other programs in the wild (e.g., scripts)
>   that want to read that information. They wouldn't be able to
>   extract it from reftable files themselves, so we would also have
>   to provide a command-line tool to read (and write?) such files.
>
> Regardless, I suggest allocating separate `value_type`s for generic
> symrefs (which then wouldn't need a `ref: ` prefix) vs. for "other"
> references.

Ack, I agree with you. Lets only store symrefs as 0x3, without the
"ref: " prefix nonsense, and don't support the "other" ref types. You
make good arguments above about why those would not be stored in a
reftable.


>> [...]
>> ### Ref index
>
> It wasn't clear to me whether (in the case of a multi-level index) ref
> index blocks have to be aligned in `block_size` blocks (both their
> maximum size and their alignment). I don't see a reason for that to be
> required, though of course a compactor implementation might choose to
> block-align these blocks based on the filesystem that is in use.
>
> For that matter, I don't see an intrinsic reason that object blocks or
> object index blocks need to be block aligned.

Yea, you are correct. There isn't an actual need for alignment.

> In fact, the only way I can see that the current reftable proposal makes
> use of `block_size` is so that `obj_record`s can record `block_delta` in
> units of `block_size` rather than in units of bytes. (And given that I'm
> skeptical about the value of the object index, that justification seems
> thin.)

This use of block_size in the obj_record also has been bothering me.
I'm changing it to use position, which removes any requirement on
alignment. It does cost a bit more space, but I'm willing to trade
that for simplification in the format definition.

> I totally accept that *you* want to align your blocks, and I'm totally
> supportive of a format that permits a reftable compactor to write
> reftables that are block-aligned. It just still seems to me that it
> imposes more complexity than necessary on *other* reftable compactor
> implementations that don't care about block alignment.
>
> Aside from the object index, I think it would be straightforward to
> write a reftable reader that is totally ignorant of `block_size`.

Yup, I think you are right. So I'll try to rework the document to make
it so alignment and padding are writer-local decisions. A writer can
choose to align, or choose to skip alignment. Readers should be
prepared for either.


>> [...]
>> #### index record
>>
>> An index record describes the last entry in another block.
>> Index records are written as:
>>
>>     varint( prefix_length )
>>     varint( (suffix_length << 3) | 0 )
>>     suffix
>>     varint( block_position )
>>
>> Index records use prefix compression exactly like `ref_record`.
>>
>> Index records store `block_position` after the suffix, specifying the
>> absolute position in bytes (from the start of the file) of the block
>> that ends with this reference.
>
> Is there a reason that the index lists the *last* refname that is
> contained in a block rather than the *first* refname? I can't think of a
> reason to choose one vs. the other, but your choice was initially
> surprising. I don't think it matters either way; I was just curious.

Yes, there is a reason. When a reader is searching the index block and
discovers a key that is greater than their search needle, they are now
sitting on a record with the block_position for that greater key. By
using the *last* refname the current block_position is the one to seek
to.

If instead we used *first* refname, the reader would now have to
backtrack to the prior index record to get the block_position out of
that record. Or it has to keep a running "prior_position" local
variable.

Using last simplifies the reader's code.


> Do I understand correctly that all `block_position`s are *byte*
> addresses, even in the `ref_index` where they should all be multiples of
> the block size (except the zeroth one)? I think that's OK, but note that
> it will waste more than a byte per `ref_index` and `obj_index` record,
> on average.

Yes, because it simplifies a lot of code, especially if we do away
with any sort of requirement for alignment.


>> Readers must examine the block header at `block_position` to determine
>> if the next block is another level index block, or the leaf-level ref
>> block.
>
> For scanning through a whole namespace, like `refs/tags/`, I guess you
> only need to use a binary search to find the beginning of the range.
> Then you would read serially forwards from there, continuing from one
> `ref_block` to the next, until you find a refname that doesn't start
> with `refs/tags/`. In other words, there is no reason to binary search
> to find the end of the namespace, correct?

Correct.


>> [...]
>> #### Importing logs
>>
>> When importing from `$GIT_DIR/logs` writers should globally order all
>> log records roughly by timestamp while preserving file order, and
>> assign unique, increasing `update_index` values for each log line.
>> Newer log records get higher `update_index` values.
>>
>> Although an import may write only a single reftable file, the reftable
>> file must span many unique `update_index`, as each log line requires
>> its own `update_index` to preserve semantics.
>
> Thinking out loud here: A really high-quality importer might want to
> group together, under the same `update_index`, ref updates that are
> thought originally to have been done in the same transaction.
[...]
> But I doubt that it is worth the effort. (The whole idea gives me nasty
> flashbacks from working on cvs2svn/cvs2git.)

Yup, that is why I didn't go down writing a description like that here. :)

             reply	other threads:[~2017-08-15 22:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15 22:47 Shawn Pearce [this message]
2017-08-18  9:24 ` reftable [v6]: new ref storage format Michael Haggerty
  -- strict thread matches above, loose matches on Subject: below --
2017-08-07  1:47 Shawn Pearce
2017-08-07 18:27 ` Stefan Beller
2017-08-07 18:30   ` Shawn Pearce
2017-08-08 23:52     ` Stefan Beller
2017-08-08  7:28 ` Jeff King
2017-08-08 19:01 ` Junio C Hamano
2017-08-08 22:27   ` Shawn Pearce
2017-08-08 23:34     ` Junio C Hamano
2017-08-09  0:01       ` Shawn Pearce
2017-08-08 19:25 ` Junio C Hamano
2017-08-08 22:30   ` Shawn Pearce
2017-08-14 12:13 ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJo=hJum2boTfcXOaZVhxmbGB9Dygoc5=TM8RD2nqxo-Ahjv9g@mail.gmail.com' \
    --to=spearce@spearce.org \
    --cc=dborowitz@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).