git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
	David Borowitz <dborowitz@google.com>
Subject: Re: reftable [v4]: new ref storage format
Date: Wed, 2 Aug 2017 05:28:46 -0400	[thread overview]
Message-ID: <20170802092846.u4lyiogvvl7ezdfq@sigill.intra.peff.net> (raw)
In-Reply-To: <CAJo=hJv=zJvbzfAZwspxECXrnBJR4XfJbGZegsNUCx=6uheO2Q@mail.gmail.com>

On Tue, Aug 01, 2017 at 07:38:37PM -0700, Shawn Pearce wrote:

> > OBJS blocks can also be
> > unbounded in size if very many references point at the same object,
> > thought that is perhaps only a theoretical problem.
> 
> Gah, I missed that in reftable. The block id pointer list could cause
> a single object id to exceed what fits in a block, and that will cause
> the writer to fail unless its caller sets the block size larger. I
> basically assumed this overflow condition is very unlikely, as its not
> common to have a huge number of refs pointing to the same object.

It's actually quite common for us, as we have big shared-object repos
that contain a copy of the refs of all of their child repos (for
reachability during packing, etc). So tags, where the value is the same
in each fork, you have one ref per fork pointing to it.

Just peeking at torvalds/linux, we have some objects with ~35K refs
pointing to them (e.g., the v2.6.11 tag).

> > Peff and I discussed off-list whether the lookup-by-SHA-1 feature is
> > so important in the first place. Currently, all references must be
> > scanned for the advertisement anyway,
> 
> Not really. You can hide refs and allow-tip-sha1 so clients can fetch
> a ref even if it wasn't in the advertisement. We really want to use
> that wire protocol capability with Gerrit Code Review to hide the
> refs/changes/ namespace from the advertisement, but allow clients to
> fetch any of those refs if they send its current SHA-1 in a want line
> anyway.
> 
> So a server could scan only the refs/{heads,tags}/ prefixes for the
> advertisement, and then leverage the lookup-by-SHA1 to verify other
> SHA-1s sent by the client.

Yeah, that makes sense (though I hope in the end that strategy will go
away in favor of a better protocol, as getting the sha1 out-of-band has
obvious UX complexities).

> > OTOH a mythical protocol v2 might reduce the need to scan the
> > references for advertisement, so maybe this optimization will be more
> > helpful in the future?
> 
> Yes, I'm hopeful we can get a v2 protocol built on the work Jonathan
> Tan is doing, and switch the advertisement around to "client speaks
> first", so that we can be smarter on the server about which refs are
> read and sent. That is a long way off, lets say 3-5 years before its
> really common in clients.

I was actually planning to spend some time on this in the next month or
two. I don't think it needs to be that complicated. We don't need a
whole protocol revamp. We just need a way to get a few bits from the
client before the advertisement, and from there we can bootstrap any
more radical protocol changes we want.

I know it will take a while before it's something we can expect in
clients, but it's definitely worth planning around. And sometimes a
feature like this can drive upgrades, if it's something that produces an
immediate and obvious benefit to the client.

> Servers today don't update HEAD reflog when a branch is pushed. I
> think trying to record that is overkill, you have the reflog data in
> the ref itself that the client sent the command to modify.

I think they do, at least for C git:

  $ git init --bare dst.git
  $ git -C dst.git config core.logallrefupdates
  $ git push dst.git
  ...
  To dst.git
   * [new branch]      master -> master
  $ find dst.git/logs -type f | xargs wc -l
    1 dst.git/logs/refs/heads/master
    1 dst.git/logs/HEAD

The special logic for "see if we're updating the ref that HEAD points
to" is deep in the ref machinery, so it gets triggered for all updates,
including pushes.

I agree it's not actually that interesting for a bare repo, where HEAD
isn't that meaningful (and doesn't tend to change a lot anyway).

> > That's what I was thinking. But I've yet to hear anybody complain
> > about missing reflogs for symrefs if the underlying reference is
> > updated directly, so maybe we should just leave that "problem"
> > unsolved. It is certainly simpler and less brittle not to have to keep
> > backreferences like these in sync with the forward references.
> 
> Yup, that is my take on it, and why I didn't try to put this into
> reftable drafts, even though it was discussed between us on the list
> in earlier messages.

Yeah, I'd agree. It might be worth doing a better job of showing the
before/after destinations in the reflog when updating a symbolic ref,
which would let you reconstruct the state from the pointed-to reflogs if
you cared to. But that's orthogonal to the storage format (you can do it
already if you bother to pass a good message to "symbolic-ref -m").

-Peff

  reply	other threads:[~2017-08-02  9:28 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-31  3:51 reftable [v4]: new ref storage format Shawn Pearce
2017-07-31 17:41 ` Dave Borowitz
2017-07-31 19:01 ` Stefan Beller
2017-07-31 23:05   ` Shawn Pearce
2017-07-31 19:42 ` Junio C Hamano
2017-07-31 23:43   ` Shawn Pearce
2017-08-01 16:08     ` Shawn Pearce
2017-08-01  6:41 ` Michael Haggerty
2017-08-01 20:23   ` Shawn Pearce
2017-08-02  0:49     ` Michael Haggerty
2017-08-01 23:27   ` Shawn Pearce
2017-08-01 23:54     ` Shawn Pearce
2017-08-02  1:51     ` Michael Haggerty
2017-08-02  2:38       ` Shawn Pearce
2017-08-02  9:28         ` Jeff King [this message]
2017-08-02 15:17           ` Shawn Pearce
2017-08-02 16:51             ` Junio C Hamano
2017-08-02 17:28             ` Jeff King
2017-08-02 12:20         ` Dave Borowitz
2017-08-02 17:18           ` Jeff King
2017-08-03 18:38         ` Michael Haggerty
2017-08-03 22:26           ` Shawn Pearce
2017-08-03 22:48             ` Michael Haggerty
2017-08-04  2:50               ` Shawn Pearce
2017-08-05 21:00       ` Shawn Pearce
2017-08-01 13:54 ` Dave Borowitz
2017-08-01 15:27   ` Shawn Pearce
2017-08-02 19:50 ` Junio C Hamano
2017-08-02 20:28   ` Jeff King
2017-08-03 22:17     ` Shawn Pearce
2017-08-03  1:50   ` Junio C Hamano
2017-08-03  2:21     ` Shawn Pearce
2017-08-03  2:36       ` Junio C Hamano
2017-08-02 19:54 ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170802092846.u4lyiogvvl7ezdfq@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dborowitz@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).