git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Stefan Beller <sbeller@google.com>
Cc: git <git@vger.kernel.org>
Subject: Re: reftable [v2]: new ref storage format
Date: Mon, 17 Jul 2017 12:04:34 -0700	[thread overview]
Message-ID: <CAJo=hJuHsKY5YkmoBqCNvv2tvRdJFtD02JrnUko7iJbpSJWHgQ@mail.gmail.com> (raw)
In-Reply-To: <CAGZ79kaFmxRSp1YBXc=6YEeDGUO-VLBF0yk+Bb0np7x80z_Vog@mail.gmail.com>

On Mon, Jul 17, 2017 at 11:53 AM, Stefan Beller <sbeller@google.com> wrote:
> On Mon, Jul 17, 2017 at 8:01 AM, Shawn Pearce <spearce@spearce.org> wrote:
>
>> A ref block is written as:
>>
>>     'r'
>>     uint24 ( block_len )
>>     ref_record*
>>     uint32( restart_offset )*
>>     uint16( number_of_restarts )
>>     padding?
>>
> ...
>>
>> The 2-byte `number_of_restarts + 1` stores the number of entries in
>> the `restart_offset` list.
>
> This means uint16( number_of_restarts ) cannot be 0, but has to be 1
> to indicate no restarts?

A block must have at least one restart in it, the first ref_record
must be a restart. So number_of_restarts in the tail of the block can
be 0, which implies 1 restart (number_of_restarts + 1), and the first
restart is required at the first ref_record. :)

> When starting to write a block, we need to know exactly how large
> the ref_records* and restart offsets need to be to put the
> number_of_restarts at the position as promised via block_len.
> This sounds complicated unless I missed the obvious.

Correct. The writer needs to compute the block size before it writes
the block. It does so by buffering the block contents until its
approximately full, then fixes block_len, and flushes the block.

> Going by this, would it rather make sense to omit the block_len
> and then scan backwards from *block_size-1 to find the first non-NUL
> and that will be the number_of_restarts?

Not quite. On small reftable files the "physical" block may be shared
with a log block ('g'). We need to be able to reliably find the of the
ref block ('r'), without padding between the two blocks.

> Or we could allow additional padding between ref_record and
> restart_offsets, such that the writer implementation has wiggle room
> for the restarting logic.

I had that in an older format description, and removed it. Placing the
padding at the end of the block was simpler for the reader and writer
implementation to handle.


>> Log record keys are structured as:
>>
>>     ref_name '\0' reverse_int32( time_sec )
>>
>> where `time_sec` is the update time in seconds since the epoch.
>
> The epoch ends 2038, which is in 21 years. (Did you just volunteer
> to fixup the time issues in 20 years?)
> If possible I'd prefer a date to be encoded with more range available.

Good point. However, I think in 20 years we'll see 2 more hash
functions for Git, and we can bump reftable to v2 and expand the field
to 8 bytes.

> The ref names itself are compressed, would we also want to compress
> the timing information?

The time field is also prefix compressed as part of the ref name's
prefix compression. So there is no need to move to the complexity of a
varint or anything else.


>> ### Update transactions
>>
>> Although reftables are immutable, mutations are supported by writing a
>> new reftable and atomically appending it to the stack:
>>
>> 1. Atomically create `stack.lock`
>> 2. Copy current stack to `stack.lock`.
>> 3. Prepare new reftable in temp file `.refXXXXXXX`.
>>    Include log entries.
>> 4. Rename (3) to `${sha1}.ref`.
>> 5. Append `${sha1}.ref` to `stack.lock`
>> 6. Atomically rename `stack.lock` to `stack`.
>
> In case 3.+4. becomes time consuming, it can be prepared outside
> the lock, such that inside the lock we'd only need to check
> for contention of refs? For example if I'd want to update one ref and
> another write wants to update another refs, we'd both be preparing
> the a new refstable containing one ref and log, then one of us obtains
> the lock and writes. The second writer would then need to inspect
> the delta of the stack and see if any ref that they want to change
> was touched.

Excellent point, it reduces the contention window for non-conflicting
writes. I will update this section with your input, thank you Stefan.

  reply	other threads:[~2017-07-17 19:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-17 15:01 reftable [v2]: new ref storage format Shawn Pearce
2017-07-17 18:53 ` Stefan Beller
2017-07-17 19:04   ` Shawn Pearce [this message]
2017-07-17 19:56     ` Stefan Beller
2017-07-17 19:51 ` Junio C Hamano
2017-07-18 20:54   ` Shawn Pearce
2017-07-19 14:02     ` Ævar Arnfjörð Bjarmason
2017-07-23 21:46       ` Shawn Pearce
2017-07-23 23:47         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJo=hJuHsKY5YkmoBqCNvv2tvRdJFtD02JrnUko7iJbpSJWHgQ@mail.gmail.com' \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).