git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	David Borowitz <dborowitz@google.com>
Subject: Re: reftable [v4]: new ref storage format
Date: Wed, 02 Aug 2017 12:50:39 -0700	[thread overview]
Message-ID: <xmqqh8xpsq9c.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <CAJo=hJv7scc1L0_MdRkFeLAJGjYm2UkTFNOgj2e4+9Zj7KSiiQ@mail.gmail.com> (Shawn Pearce's message of "Sun, 30 Jul 2017 20:51:24 -0700")

Shawn Pearce <spearce@spearce.org> writes:

> ### Layout
>
> The `$GIT_DIR/refs` path is a file when reftable is configured, not a
> directory.  This prevents loose references from being stored.
>
> A collection of reftable files are stored in the `$GIT_DIR/reftable/`
> directory:
>
>     00000001_UF4paF
>     00000002_bUVgy4
>
> where reftable files are named by a unique name such as produced by
> the function:
>
>     mktemp "${update_index}_XXXXXX"
>
> The stack ordering file is `$GIT_DIR/refs` and lists the current
> files, one per line, in order, from oldest (base) to newest (most
> recent):
>
>     $ cat .git/refs
>     00000001_UF4paF
>     00000002_bUVgy4
>
> Readers must read `$GIT_DIR/refs` to determine which files are
> relevant right now, and search through the stack in reverse order
> (last reftable is examined first).
>
> Reftable files not listed in `refs` may be new (and about to be added
> to the stack by the active writer), or ancient and ready to be pruned.

I like the general idea, what the file format can represent and how
it does so, but I am a bit uneasy about how well this "stacked" part
would work for desktop clients.  The structure presented here is for
optimizing the "we want to learn about many (or all) refs" access
pattern, which probably matters a lot on the server implementations,
but I do not feel comfortable without knowing how much it penalizes
"I want the current value of this single ref" access pattern.

With the traditional "packed-refs plus loose" layout, no matter how
many times a handful of selected busy refs are updated during the
day, you'd need to open at most two files to find out the current
value of a single ref (admittedly, the accessing of the second file,
after we realize that there is no loose one, would be very costly).
If you make a few commits on a topic branch A, then build a 100
commit series on top of another topic branch B, finding the current
value of A is still one open and read of refs/heads/A.

With the reftable format, we'd need to open and read all 100
incremental transactions that touch branch B before realizing that
none of them talk about A, and read the next transaction file to
find the current value of A.  To keep this number low, we'd need
quite a frequent compaction.

We can just declare that reftable format is not for desktop clients
but for server implementations where frequent compaction would not
be an annoyance to the users, but I'd wish we do not have to.




  parent reply	other threads:[~2017-08-02 19:50 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-31  3:51 reftable [v4]: new ref storage format Shawn Pearce
2017-07-31 17:41 ` Dave Borowitz
2017-07-31 19:01 ` Stefan Beller
2017-07-31 23:05   ` Shawn Pearce
2017-07-31 19:42 ` Junio C Hamano
2017-07-31 23:43   ` Shawn Pearce
2017-08-01 16:08     ` Shawn Pearce
2017-08-01  6:41 ` Michael Haggerty
2017-08-01 20:23   ` Shawn Pearce
2017-08-02  0:49     ` Michael Haggerty
2017-08-01 23:27   ` Shawn Pearce
2017-08-01 23:54     ` Shawn Pearce
2017-08-02  1:51     ` Michael Haggerty
2017-08-02  2:38       ` Shawn Pearce
2017-08-02  9:28         ` Jeff King
2017-08-02 15:17           ` Shawn Pearce
2017-08-02 16:51             ` Junio C Hamano
2017-08-02 17:28             ` Jeff King
2017-08-02 12:20         ` Dave Borowitz
2017-08-02 17:18           ` Jeff King
2017-08-03 18:38         ` Michael Haggerty
2017-08-03 22:26           ` Shawn Pearce
2017-08-03 22:48             ` Michael Haggerty
2017-08-04  2:50               ` Shawn Pearce
2017-08-05 21:00       ` Shawn Pearce
2017-08-01 13:54 ` Dave Borowitz
2017-08-01 15:27   ` Shawn Pearce
2017-08-02 19:50 ` Junio C Hamano [this message]
2017-08-02 20:28   ` Jeff King
2017-08-03 22:17     ` Shawn Pearce
2017-08-03  1:50   ` Junio C Hamano
2017-08-03  2:21     ` Shawn Pearce
2017-08-03  2:36       ` Junio C Hamano
2017-08-02 19:54 ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh8xpsq9c.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=dborowitz@google.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).