git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Eric Wong <e@80x24.org>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] update-server-info: avoid needless overwrites
Date: Tue, 14 May 2019 05:47:29 -0400	[thread overview]
Message-ID: <20190514094729.GA12256@sigill.intra.peff.net> (raw)
In-Reply-To: <87tve0w3ao.fsf@evledraar.gmail.com>

On Sun, May 12, 2019 at 09:16:31AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > You're probably right (especially because we'd just spent O(n) work
> > generating the candidate file). But note that I have seen pathological
> > cases where info/refs was gigabytes.
> 
> Wouldn't those users be calling update-server-info from something like a
> post-receive hook where they *know* the refs/packs just got updated?
>
> Well, there is "transfer.unpackLimit" to contend with, but that's just
> for "packs are same", not "refs are same", and that file's presumably
> much smaller.

Yeah, I think there's sort of an open question here of who is calling
update-server-info when nothing got updated. I think the only place we
call it automatically is via receive-pack, but I'd guess Eric runs it as
part of public-inbox scripts.

I agree that this is probably mostly about info/packs. Not every push
(or repo update) will create one, but every push _should_ be changing a
ref (if it succeeds at all).  And I'd guess Eric's interest comes from
the use of Git in public-inbox, which is going to make frequent but
small updates.

So this does seem like a really niche case; it avoids one single HTTP
request of a small that should generally be small (unless you're letting
your pack list grow really big, but I think there are other issues with
that) in a case that we know will generate a bunch of other HTTP traffic
(if somebody updated the server info, there was indeed a push, so you'd
get a refreshed info/refs and presumably the new loose objects).

That said, the logic to preserve the mtime is pretty straightforward, so
I don't mind it too much if Eric finds this really improves things for
him.

> > I don't think our usual dot-locking is great here. What does the
> > race-loser do when it can't take the lock? It can't just skip its
> > update, since it needs to make sure that its new pack is mentioned.
> 
> Right, I mean a *global* .git/I_AM_DOING_WRITES.lock, because there's no
> way to square the ref backend's notion of per-ref ref lock enabling
> concurrent pushes with update-server-info's desire to generate metadata
> showing *all* the refs.

OK. I agree that would work, but it's nasty to delay user-initiated
operations for ongoing maintenance (another obvious place to want such a
lock is for pruning, which can take quite a while).

> > So it has to wait and poll until the other process finishes. I guess
> > maybe that isn't the end of the world.
> 
> If "its" is update-server-info this won't work. It's possible for two
> update-server-info processes to be racing in such a way that their
> for_each_ref() reads a ref at a given value that'll be updated 1
> millisecond later, but to then have that update's update-server-info
> "win" the race to update the info files (hypothetical locks on those
> info files and all).
> 
> Thus the "info" files will be updated to old values, since we'll see we
> need changes, but change things to the old values.
> 
> All things that *can* be dealt with in some clever ways, but I think
> just further proof nobody's using this for anything serious :)
> 
> I.e. the "commit A happened before B" but also "commit B's post-* hooks
> finished after A" is a thing that happens quite frequently (per my
> logs).

I think it would work because any update-server-info, whether from A or
B, will take into account the full current repo state (and we don't look
at that state until we take the lock). So you might get an interleaved
"A-push, B-push, B-maint, A-maint", but that's OK. A-maint will
represent B's state when it runs.

> > I'm not entirely sure of all of the magic that "stale" check is trying
> > to accomplish. I think there's some bits in there that try to preserve
> > the existing ordering, but I don't know why anyone would care (and there
> > are so many cases where the ordering gets thrown out that I think
> > anybody who does care is likely to get disappointed).
> 
> My reading of it is that it's premature optimization that can go away
> (and most of it has already), for "it's cheap" and "if not it's called
> from the 'I know I had an update'" hook case, as noted above.<

That's my reading, too, but I didn't want to be responsible for
regressing some obscure case. At least Eric seems to _use_
update-server-info. ;)

-Peff

  reply	other threads:[~2019-05-14  9:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-11  1:34 Eric Wong
2019-05-11  7:35 ` Eric Sunshine
2019-05-11 20:47   ` [PATCH v2] " Eric Wong
2019-05-11 21:17 ` [PATCH] " Eric Wong
2019-05-11 23:37 ` Ævar Arnfjörð Bjarmason
2019-05-12  0:38   ` Eric Wong
2019-05-12  4:08   ` Jeff King
2019-05-12  7:16     ` Ævar Arnfjörð Bjarmason
2019-05-14  9:47       ` Jeff King [this message]
2019-05-14 10:33         ` Ævar Arnfjörð Bjarmason
2019-05-14 11:24           ` Jeff King
2019-05-14 11:57             ` Ævar Arnfjörð Bjarmason
2019-05-14 11:50         ` Eric Wong
2019-05-14 12:13           ` dumb HTTP things I want to do Eric Wong
2019-05-14 12:27             ` Jeff King
2019-05-14 12:19           ` [PATCH] update-server-info: avoid needless overwrites Ævar Arnfjörð Bjarmason
2019-05-14 12:29             ` Jeff King
2019-05-15  0:45             ` [PATCH 2/1] server-info: conditionally update on fetch Eric Wong
2019-05-15 20:38               ` [WIP] repack leaving stale entries in objects/info/packs Eric Wong
2019-05-15 21:48                 ` Jeff King
2019-05-23  8:59                   ` [PATCH] server-info: do not list unlinked packs Eric Wong
2019-05-23 10:24                     ` Jeff King
2019-05-23 17:27                       ` [PATCH v2] " Eric Wong
2019-05-24  6:05                         ` Jeff King
2019-05-24  7:34                         ` Ævar Arnfjörð Bjarmason
2019-05-13 23:17 ` [PATCH v3] update-server-info: avoid needless overwrites Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190514094729.GA12256@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --subject='Re: [PATCH] update-server-info: avoid needless overwrites' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).