git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Stefan Beller <sbeller@google.com>
Cc: git <git@vger.kernel.org>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	David Borowitz <dborowitz@google.com>, Jeff King <peff@peff.net>
Subject: Re: reftable [v7]: new ref storage format
Date: Wed, 16 Aug 2017 08:54:10 -0700	[thread overview]
Message-ID: <CAJo=hJtAL9SX5Z4XyNFS5L+H3j3OaTe6t-ug7oefpNMXHh525A@mail.gmail.com> (raw)
In-Reply-To: <CAGZ79kZ4m0-KBFs1pbOvRqkR=0vn-Jbn1FATL_KzW+km0K-S2A@mail.gmail.com>

On Tue, Aug 15, 2017 at 11:15 PM, Stefan Beller <sbeller@google.com> wrote:
> On Tue, Aug 15, 2017 at 7:48 PM, Shawn Pearce <spearce@spearce.org> wrote:
>> 7th iteration of the reftable storage format.
>>
>> You can read a rendered version of this here:
>> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>>
>> Changes from v6:
>> - Blocks are variable sized, and alignment is optional.
>> - ref index is required on variable sized multi-block files.
>>
>> - restart_count/offsets are again at the end of the block.
>> - value_type = 0x3 is only for symbolic references.
>> - "other" files cannot be stored in reftable.
>>
>> - object blocks are explicitly optional.
>> - object blocks use position (offset in bytes), not block id.
>> - removed complex log_chained format for log blocks
>>
>> - Layout uses log, ref file extensions
>> - Described reader algorithm to obtain a snapshot
>
> - back to the old "intra-block index is last"
>   for all block types. ok.

Yes, it simplifies "streaming writers" who don't want to buffer a lot.

> - changed (only ref?) indexes to start char + 3 byte size:
>   Which starting char do object/log indexes have?

All index blocks use 'i'.

> "Unaligned files must include the ref index to support fast lookup."
>
> Why this? I would imagine the client (which has ~5 branches),
> would not need this, but only a ref block, that's it.

The quoted part is I think incomplete. Unaligned files need the ref
index if there is more than one ref block, as there is no way to
divide the space for binary search. A single ref block with 5 branches
does not need the ref index.

> Ctrl-F for 'block_size' reveals nothing is computed
> relative to the block_size in this format, yet we can
> set it to an arbitrary number. If following the spec,
> the reader at $DAY_JOB needs to be able to read
> both aligned and unaligned reftables, despite our plan
> to ever write aligned ref tables, what would the reader
> use the block_size for? (I think we can omit that field
> from the header/footer now, no?)

Its really helpful to be present for the reader to know how to locate
and read blocks. If the ref index is missing and there are multiple
ref blocks in an aligned file, a reader can use block_size to divide
the space and perform binary search. Even when the ref index is
present, the reader can use block_size to issue a disk IO read of
block_size bytes without reading the block_len of the target block
first.

At $DAY_JOB the block_size is tunable by the writer and could change
at any time, so its useful to have it embedded in the output.

  reply	other threads:[~2017-08-16 15:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-16  2:48 reftable [v7]: new ref storage format Shawn Pearce
2017-08-16  6:15 ` Stefan Beller
2017-08-16 15:54   ` Shawn Pearce [this message]
2017-08-16 21:05 ` Junio C Hamano
2017-08-17  8:06   ` Michael Haggerty
2017-08-17 18:22     ` Junio C Hamano
2017-08-18  0:52 ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJo=hJtAL9SX5Z4XyNFS5L+H3j3OaTe6t-ug7oefpNMXHh525A@mail.gmail.com' \
    --to=spearce@spearce.org \
    --cc=dborowitz@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).