git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Taylor Blau <me@ttaylorr.com>, Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 0/2] receive-pack: use advertised reference tips to inform connectivity check
Date: Tue, 1 Nov 2022 12:49:51 +0100	[thread overview]
Message-ID: <Y2EH3+l62S55EJOZ@ncase> (raw)
In-Reply-To: <Y2DgJi6foPyBhycU@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 6412 bytes --]

On Tue, Nov 01, 2022 at 05:00:22AM -0400, Jeff King wrote:
> On Fri, Oct 28, 2022 at 04:42:19PM +0200, Patrick Steinhardt wrote:
> 
> >     - A client shouldn't assume objects to exist that have not been part
> >       of the reference advertisement. But if it excluded an object from
> >       the packfile that is reachable via any ref that is excluded from
> >       the reference advertisement due to `transfer.hideRefs` we'd have
> >       accepted the push anyway. I'd argue that this is a bug in the
> >       current implementation.
> 
> Like others, I don't think this is a bug exactly. We'd never introduce a
> corruption. We're just more lenient with clients than we need to be.
> 
> But I don't think your scheme changes that. In a sense, the tips used by
> "rev-list --not --all" are really an optimization. We will walk the
> history from the to-be-updated ref tips all the way down to the roots if
> we have to. So imagine that I have object X which is not referenced at
> all (neither hidden nor visible ref). We obviously do not advertise it
> to the client, but let's further imagine that a client sends us a pack
> with X..Y, and a request to update some ref to Y.
> 
> Both before and after your code, if rev-list is able to walk down from Y
> until either we hit all roots or all UNINTERESTING commits, it will be
> satisfied. So as long as the receiving repo actually has all of the
> history leading up to X, it will allow the push, regardless of your
> patch.

Oh, right! Now I see where my thinko was, which means both you and
Taylor are correct. I somehow assumed that we'd fail the connectivity
check in that case, but all it means is that we now potentially walk
more objects than we'd have done if we used `--not --all`.

> If we wanted to stop being lenient, we'd have to actually check that
> every object we traverse is either reachable, or came from the
> just-pushed pack.

Yes, indeed.

> There's also a subtle timing issue here. Our connectivity check happens
> after we've finished receiving the pack. So not only are we including
> hidden refs, but we are using the ref state at the end of the push
> (after receiving and processing the incoming pack), rather than the
> beginning.
> 
> From the same "leniency" lens this seems like the wrong thing. But as
> above, it doesn't matter in practice, because these tips are really an
> optimization to tell rev-list that it can stop traversing.
> 
> If you think of the connectivity check less as "did the client try to
> cheat" and more as "is it OK to update these refs without introducing a
> corruption", then it makes sense that you'd want to do read the inputs
> to the check as close to the ref update as possible, because it shrinks
> the race window which could introduce corruption.

Agreed.

> Imagine a situation like this:
> 
>   0. We advertise to client that we have commit X.
> 
>   1. Client starts pushing up a pack with X..Y and asks to update some
>      branch to Y.
> 
>   2. Meanwhile, the branch with X is deleted, and X is pruned.
> 
>   3. Server finishes receiving the pack. All looks good, and then we
>      start a connectivity check.
> 
> In the current code, that check starts with the current ref state (with
> X deleted) as a given, and makes sure that we have the objects we need
> to update the refs. After your patches, it would take X as a given, and
> stop traversing when we see it.
> 
> That same race exists before your patch, but it's between the time of
> "rev-list --not --all" running and the ref update. After your patch,
> it's between the advertisement and the ref update, which can be a long
> time (hours or even days, if the client is very slow).
> 
> In practice I'm not sure how big a deal this is. If we feed the
> now-pruned X to rev-list, it may notice that X went away, though we've
> been reducing the number of checks there in the name of efficiency
> (e.g., if it's still in the commit graph, we'd say "OK, good enough"
> these days, even if we don't have it on disk anymore).
> 
> But it feels like a wrong direction to make that race longer if there's
> no need to.

Good point.

> So all that said...
> 
> >     - Second, by using advertised refs as inputs instead of `git
> >       rev-list --not --all` we avoid looking up all refs that are
> >       irrelevant to the current push. This can be a huge performance
> >       improvement in repos that have a huge amount of internal, hidden
> >       refs. In one of our repos with 7m refs, of which 6.8m are hidden,
> >       this speeds up pushes from ~30s to ~4.5s.
> 
> I like the general direction here of avoiding the hidden refs. The
> client _shouldn't_ have been using them, so we can optimistically assume
> they're useless (and in the case of races or other weirdness, rev-list
> just ends up traversing a bit further).
> 
> But we can split the two ideas in your series:
> 
>   1. Feed the advertised tips from receive-pack to rev-list.
> 
>   2. Avoid traversing from the hidden tips.
> 
> Doing (1) gets you (2) for free. But if we don't want to do (1), and I
> don't think we do, we can get (2) by just teaching rev-list to narrow
> the check.
> 
> I see some discussion in the other part of the thread, and we may need a
> new rev-list option to do this, as mentioned there. However, you _might_
> be able to do it the existing --exclude mechanism. I.e., something like:
> 
>   rev-list --stdin --not --exclude 'refs/hidden/*' --all

Yeah, Taylor proposed to add a new `--visible-refs=receive` option that
lets git-rev-list(1) automatically add all references that are visible
when paying attention to `receive.hideRefs`. I actually like this idea
and will likely have a look at how easy or hard it is to implement.

> The gotchas are:
> 
>   - I'm not 100% sure that --exclude globbing and transfer.hideRefs
>     syntax are compatible. You'd want to check.
> 
>   - these would have to come on the command line (at least with the
>     current code). Probably nobody has enough hiderefs patterns for that
>     to be a problem (and remember we are passing the glob pattern here,
>     not the 6.8M refs themselves). But it could bite somebody in a
>     pathological case.
> 
> -Peff

Well, we can avoid these gotchas if we used `--visible-refs`.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-11-01 11:53 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-28 14:42 [PATCH 0/2] receive-pack: use advertised reference tips to inform connectivity check Patrick Steinhardt
2022-10-28 14:42 ` [PATCH 1/2] connected: allow supplying different view of reachable objects Patrick Steinhardt
2022-10-28 14:54   ` Ævar Arnfjörð Bjarmason
2022-10-28 18:12   ` Junio C Hamano
2022-10-30 18:49     ` Taylor Blau
2022-10-31 13:10     ` Patrick Steinhardt
2022-11-01  1:16       ` Taylor Blau
2022-10-28 14:42 ` [PATCH 2/2] receive-pack: use advertised reference tips to inform connectivity check Patrick Steinhardt
2022-10-28 15:01   ` Ævar Arnfjörð Bjarmason
2022-10-31 14:21     ` Patrick Steinhardt
2022-10-31 15:36       ` Ævar Arnfjörð Bjarmason
2022-10-30 19:09   ` Taylor Blau
2022-10-31 14:45     ` Patrick Steinhardt
2022-11-01  1:28       ` Taylor Blau
2022-11-01  7:20         ` Patrick Steinhardt
2022-11-01 11:53           ` Patrick Steinhardt
2022-11-02  1:05             ` Taylor Blau
2022-11-01  8:28       ` Jeff King
2022-10-28 16:40 ` [PATCH 0/2] " Junio C Hamano
2022-11-01  1:30 ` Taylor Blau
2022-11-01  9:00 ` Jeff King
2022-11-01 11:49   ` Patrick Steinhardt [this message]
2022-11-03 14:37 ` [PATCH v2 0/3] receive-pack: only use visible refs for " Patrick Steinhardt
2022-11-03 14:37   ` [PATCH v2 1/3] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-03 14:37   ` [PATCH v2 2/3] revision: add new parameter to specify all visible refs Patrick Steinhardt
2022-11-05 12:46     ` Jeff King
2022-11-07  8:20       ` Patrick Steinhardt
2022-11-08 14:32         ` Jeff King
2022-11-05 12:55     ` Jeff King
2022-11-03 14:37   ` [PATCH v2 3/3] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-05  0:40   ` [PATCH v2 0/3] " Taylor Blau
2022-11-05 12:55     ` Jeff King
2022-11-05 12:52   ` Jeff King
2022-11-07 12:16 ` [PATCH v3 0/6] " Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 1/6] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 2/6] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 3/6] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-07 12:51     ` Ævar Arnfjörð Bjarmason
2022-11-08  9:11       ` Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 4/6] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-07 13:34     ` Ævar Arnfjörð Bjarmason
2022-11-07 17:07       ` Ævar Arnfjörð Bjarmason
2022-11-08  9:48         ` Patrick Steinhardt
2022-11-08  9:22       ` Patrick Steinhardt
2022-11-08  0:57     ` Taylor Blau
2022-11-08  8:16       ` Patrick Steinhardt
2022-11-08 14:42         ` Jeff King
2022-11-07 12:16   ` [PATCH v3 5/6] revparse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-08 14:44     ` Jeff King
2022-11-07 12:16   ` [PATCH v3 6/6] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-08  0:59   ` [PATCH v3 0/6] " Taylor Blau
2022-11-08 10:03 ` [PATCH v4 " Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 1/6] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-08 13:36     ` Ævar Arnfjörð Bjarmason
2022-11-08 14:49       ` Patrick Steinhardt
2022-11-08 14:51     ` Jeff King
2022-11-08 10:03   ` [PATCH v4 2/6] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 3/6] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 4/6] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-08 15:07     ` Jeff King
2022-11-08 21:13       ` Taylor Blau
2022-11-11  5:48       ` Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 5/6] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-08 10:04   ` [PATCH v4 6/6] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-11  6:49 ` [PATCH v5 0/7] " Patrick Steinhardt
2022-11-11  6:49   ` [PATCH v5 1/7] refs: fix memory leak when parsing hideRefs config Patrick Steinhardt
2022-11-11  6:49   ` [PATCH v5 2/7] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 3/7] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 4/7] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 5/7] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 6/7] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 7/7] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-11 22:18   ` [PATCH v5 0/7] " Taylor Blau
2022-11-15 17:26     ` Jeff King
2022-11-16 21:22       ` Taylor Blau
2022-11-16 22:04         ` Jeff King
2022-11-16 22:33           ` Taylor Blau
2022-11-17  5:45             ` Patrick Steinhardt
2022-11-17  5:46 ` [PATCH v6 " Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 1/7] refs: fix memory leak when parsing hideRefs config Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 2/7] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 3/7] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 4/7] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 5/7] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-17  5:47   ` [PATCH v6 6/7] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-17  5:47   ` [PATCH v6 7/7] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-17 15:03   ` [PATCH v6 0/7] " Jeff King
2022-11-17 21:24     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2EH3+l62S55EJOZ@ncase \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).