list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Shawn Pearce <>
To: Jeff King <>
Cc: git <>,
	David Turner <>,
	Michael Haggerty <>
Subject: Re: RefTree: Alternate ref backend
Date: Thu, 17 Dec 2015 14:28:01 -0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Thu, Dec 17, 2015 at 2:10 PM, Jeff King <> wrote:
> On Thu, Dec 17, 2015 at 01:02:50PM -0800, Shawn Pearce wrote:
>> I started playing around with the idea of storing references directly
>> in Git. Exploiting the GITLINK tree entry, we can associate a name to
>> any SHA-1.
> Gitlink entries don't imply reachability, though. I guess that doesn't
> matter if your ref backend says "no, really, these are the ref tips, and
> they are reachable".

Exactly. This works with existing JGit because it swaps out the ref
backend. When GC tries to enumerate the roots (current refs), it gets
these through the ref backend by scanning the tree recursively. The
packer itself doesn't care where those roots came from.

Same would be true for any other pluggable ref backend in git-core. GC
has to ask the ref backend, and then trust its reply. How/where that
ref backend tracks that is an implementation detail.

>  But you could not push the whole thing up to
> another server and expect it to hold the whole graph.

Correct, pushing this to another repository doesn't transmit the
graph. If the other repository also used this for its refs backend,
its now corrupt and confused out of its mind. Just like copying the
packed-refs file with scp. Don't do that. :)

> Which is not strictly necessary, but to me seems like the real advantage
> of using git objects versus some other system.

One advantage is you can edit HEAD symref remotely. Commit a different
symlink value and push. :)

I want to say more, but I'm going to hold back right now. There's more
going on in my head than just this.

> Of course, the lack of reachability has advantages, too. You can
> drop commits pointed to by old reflogs without rewriting the ref
> history.


> Unfortunately you cannot expunge the reflogs at all. That's
> good if you like audit trails. Bad if you are worried that your reflogs
> will grow large. :)

At present our servers do not truncate their reflogs. Yes some are... big.

I considered truncating this graph by just using a shallow marker. Add
a shallow entry and repack. The ancient history will eventually be
garbage collected and disappear.

One advantage of this format is deleted branches can retain a reflog
post deletion. Another is you can trivially copy the reflog using
native Git to another system for backup purposes. Or fetch it over the
network to inspect locally. So a shared group server could be
exporting its reflog, you can fetch it and review locally what
happened to branches without logging into the shared server.

So long as you remember that copying the reflog doesn't mean you
actually copied the commit histories, its works nicely.

Another advantage of this format over LMDB or TDB or whatever is Git
already understands it. The tools already understand it. Plumbing can
inspect and repair things. You can reflog the reflog using traditional
reflog ($GIT_DIR/reflogs/refs/txn/committed).

>> By storing all references in a single tree, atomic transactions are
>> possible. Its a simple compare-and-swap of a single 40 byte SHA-1.
>> This of course leads to a bootstrapping problem, where do we store the
>> 40 byte SHA-1? For this example its just $GIT_DIR/refs/txn/committed
>> as a classical loose reference.
> Somehow putting it inside `refs/` seems weird to me, in an infinite
> recursion kind of way.  I would have picked $GIT_DIR/REFSTREE or
> something. But that is a minor point.

I had started with $GIT_DIR/REFS, but see above. I have more going on
in my head. This is only a tiny building block.

>> Configuration:
>>   [core]
>>     repositoryformatversion = 1
>>   [extensions]
>>     refsBackendType = RefTree
> The semantics of extensions config keys are open-ended. The
> formatVersion=1 spec only says "if there is a key you don't know about,
> then you may not proceed". Now we're defining a refsBackendType
> extension. It probably makes sense to write up a few rules (e.g., is
> RefTree case-sensitive?).

In my prototype in JGIt I parse it as case insensitive, but used
CamelCase because the JavaClassNameIsNamedThatWayBecauseJava.

  reply	other threads:[~2015-12-17 22:28 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-17 21:02 Shawn Pearce
2015-12-17 21:57 ` Junio C Hamano
2015-12-17 22:15   ` Shawn Pearce
2015-12-17 22:10 ` Jeff King
2015-12-17 22:28   ` Shawn Pearce [this message]
2015-12-18  1:36     ` Mike Hommey
2015-12-22 15:41 ` Michael Haggerty
2015-12-22 16:11   ` Shawn Pearce
2015-12-22 17:04     ` Dave Borowitz
2015-12-22 17:17     ` Michael Haggerty
2015-12-22 18:50       ` Shawn Pearce
2015-12-22 19:09         ` Junio C Hamano
2015-12-22 19:11           ` Shawn Pearce
2015-12-22 19:34             ` Junio C Hamano
2015-12-23  4:59               ` Michael Haggerty
2015-12-24  1:33                 ` Junio C Hamano
     [not found]       ` <4689734.cEcQ2vR0aQ@mfick1-lnx>
2015-12-22 20:56         ` Martin Fick
2015-12-22 21:23           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \
    --subject='Re: RefTree: Alternate ref backend' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).