git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Eric Wong <e@80x24.org>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] update-server-info: avoid needless overwrites
Date: Sun, 12 May 2019 00:08:26 -0400	[thread overview]
Message-ID: <20190512040825.GA25370@sigill.intra.peff.net> (raw)
In-Reply-To: <87v9ygwoj0.fsf@evledraar.gmail.com>

On Sun, May 12, 2019 at 01:37:55AM +0200, Ævar Arnfjörð Bjarmason wrote:

> This way of doing it just seems so weirdly convoluted. Read them one at
> a time, compute the SHA-1, just to see if they're different. Why not
> something closer to a plain memcmp():

FWIW, I had the exact same thought on reading the patch. Checking sizes
seems like an easy optimization and I don't mind it, but computing
hashes when we could just compare the bytes seems pointless. I'd have
expected to stream 4k blocks as we go.

> I.e. optimze for code simplicity with something close to a dumb "cmp
> --silent". I'm going to make the bold claim that nobody using "dumb
> http" is going to be impacted by the performance of reading their
> for-each-ref or for-each-pack dump, hence opting for not even bothing to
> stat() to get the size before reading.

You're probably right (especially because we'd just spent O(n) work
generating the candidate file). But note that I have seen pathological
cases where info/refs was gigabytes.

> >  /*
> >   * Create the file "path" by writing to a temporary file and renaming
> >   * it into place. The contents of the file come from "generate", which
> >   * should return non-zero if it encounters an error.
> >   */
> > -static int update_info_file(char *path, int (*generate)(FILE *))
> > +static int update_info_file(char *path, int (*generate)(FILE *), int force)
> >  {
> >  	char *tmp = mkpathdup("%s_XXXXXX", path);
> 
> Unrelated to this, patch, but I hadn't thought about this nasty race
> condition. We recommend users run this from the "post-update" (or
> "post-receive") hook, and don't juggle the lock along with the ref
> update, thus due to the vagaries of scheduling you can end up with two
> concurrent ref updates where the "old" one wins.
> 
> But I guess that brings me back to something close to "nobody with that
> sort of update rate is using 'dumb http'" :)
> 
> For this to work properly we'd need to take some sort of global "ref
> update/pack update" lock, and I guess at that point this "cmp" case
> would be a helper similar to commit_lock_file_to(),
> i.e. a commit_lock_file_to_if_different().

I don't think our usual dot-locking is great here. What does the
race-loser do when it can't take the lock? It can't just skip its
update, since it needs to make sure that its new pack is mentioned.

So it has to wait and poll until the other process finishes. I guess
maybe that isn't the end of the world.

> Then in b3223761c8 ("update_info_refs(): drop unused force parameter",
> 2019-04-05) Jeff removed the unused-for-a-decade "force" param.
> 
> So having gone through the history I think we're better off just
> dropping the --force argument entirely, or at least changing the
> docs.
> 
> Before this change the only thing it was doing was pruning stuff we
> haven't written since 2005-ish (see 3e15c67c90 ("server-info: throw away
> T computation as well.", 2005-12-04)), rather than "detect if useless"
> we should just write out the file again, and then skip if changed
> (i.e. this logic).

Yeah, my commit only ripped out the useless force parameter for
info/refs. For info/packs, there's still that weird "is is stale"
computation (which I fixed several bugs in). It's not entirely clear to
me if that can just go away, but I agree that if we can it's simpler and
more desirable to just generate the candidate result and see if it's
bit-for-bit identical or not.

I'm not entirely sure of all of the magic that "stale" check is trying
to accomplish. I think there's some bits in there that try to preserve
the existing ordering, but I don't know why anyone would care (and there
are so many cases where the ordering gets thrown out that I think
anybody who does care is likely to get disappointed).

-Peff

  parent reply	other threads:[~2019-05-12  4:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-11  1:34 [PATCH] update-server-info: avoid needless overwrites Eric Wong
2019-05-11  7:35 ` Eric Sunshine
2019-05-11 20:47   ` [PATCH v2] " Eric Wong
2019-05-11 21:17 ` [PATCH] " Eric Wong
2019-05-11 23:37 ` Ævar Arnfjörð Bjarmason
2019-05-12  0:38   ` Eric Wong
2019-05-12  4:08   ` Jeff King [this message]
2019-05-12  7:16     ` Ævar Arnfjörð Bjarmason
2019-05-14  9:47       ` Jeff King
2019-05-14 10:33         ` Ævar Arnfjörð Bjarmason
2019-05-14 11:24           ` Jeff King
2019-05-14 11:57             ` Ævar Arnfjörð Bjarmason
2019-05-14 11:50         ` Eric Wong
2019-05-14 12:13           ` dumb HTTP things I want to do Eric Wong
2019-05-14 12:27             ` Jeff King
2019-05-14 12:19           ` [PATCH] update-server-info: avoid needless overwrites Ævar Arnfjörð Bjarmason
2019-05-14 12:29             ` Jeff King
2019-05-15  0:45             ` [PATCH 2/1] server-info: conditionally update on fetch Eric Wong
2019-05-15 20:38               ` [WIP] repack leaving stale entries in objects/info/packs Eric Wong
2019-05-15 21:48                 ` Jeff King
2019-05-23  8:59                   ` [PATCH] server-info: do not list unlinked packs Eric Wong
2019-05-23 10:24                     ` Jeff King
2019-05-23 17:27                       ` [PATCH v2] " Eric Wong
2019-05-24  6:05                         ` Jeff King
2019-05-24  7:34                         ` Ævar Arnfjörð Bjarmason
2019-05-13 23:17 ` [PATCH v3] update-server-info: avoid needless overwrites Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190512040825.GA25370@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).