git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Vegard Nossum <vegard.nossum@oracle.com>
Cc: git@vger.kernel.org,
	"Quentin Casasnovas" <quentin.casasnovas@oracle.com>,
	"Shawn Pearce" <spearce@spearce.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH] fetch: use "quick" has_sha1_file for tag following
Date: Thu, 13 Oct 2016 13:04:43 -0400	[thread overview]
Message-ID: <20161013170443.43slna3zvcvrse5r@sigill.intra.peff.net> (raw)
In-Reply-To: <20161013165344.jv7hyj74q33yb4ip@sigill.intra.peff.net>

On Thu, Oct 13, 2016 at 12:53:44PM -0400, Jeff King wrote:

> -- >8 --
> Subject: [PATCH] fetch: use "quick" has_sha1_file for tag following

A few comments on my own patch...

> This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
> accuracy for speed, in cases where we might be racy with a
> simultaneous repack. This is similar to the fix in 0eeb077
> (index-pack: avoid excessive re-reading of pack directory,
> 2015-06-09). As with that case, it's OK for has_sha1_file()
> occasionally say "no I don't have it" when we do, because
> the worst case is not a corruption, but simply that we may
> fail to auto-follow a tag that points to it.

Failing in this direction doesn't make me feel great. I had hoped it
would fail the _other_ way, and request an object that we might already
have.

There are two alternatives that might be worth pursuing:

  1. The thing that makes this really painful is the quadratic
     duplicate-search in prepare_packed_git_one(). We could speed that
     up pretty easily by keeping a hash of the packs indexed by their
     names, and looking up in that.

     That drops the quadratic factor, but it's still going to be
     O(nr_tags * nr_packs), as we have to at the very least readdir()
     all of objects/pack to see that nothing new showed up.

     I wonder if we could trust the timestamp on the objects/pack
     directory to avoid re-reading it at all. That turns it into a
     single stat() call.

  2. We could stop bothering to reprepare_packed_git() only after the
     nth time it is called. This basically turns on the "sacrifice
     accuracy for speed" behavior automatically, but it means that most
     cases would never do so, because it wouldn't be creating a
     performance issue in the first place.

     It feels weird and flaky that git might behave differently based on
     the number of unfetched tags the remote happens to have, though.

> Here are results from the included perf script, which sets
> up a situation similar to the one described above:
> 
> Test            HEAD^               HEAD
> ----------------------------------------------------------
> 5550.4: fetch   11.21(10.42+0.78)   0.08(0.04+0.02) -99.3%

The situation in this perf script is obviously designed to show off this
one specific optimization. It feels a bit overly specific, and I doubt
anybody will be all that interested in running it again once the fix is
in. OTOH, I did want to document my reproduction steps, and this seemed
like the only reasonable place to do so. And as the perf suite is
already pretty expensive, perhaps nobody minds adding to it too much.

-Peff

  reply	other threads:[~2016-10-13 17:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 22:30 Huge performance bottleneck reading packs Vegard Nossum
2016-10-12 22:45 ` Junio C Hamano
2016-10-13  7:17   ` Vegard Nossum
2016-10-13 14:50     ` Jeff King
2016-10-12 23:01 ` Jeff King
2016-10-12 23:18   ` Jeff King
2016-10-12 23:47     ` Jeff King
2016-10-13  9:04       ` Vegard Nossum
2016-10-14  9:35         ` Jakub Narębski
2016-10-13  7:20   ` Vegard Nossum
2016-10-13 15:26     ` Jeff King
2016-10-13 16:53       ` [PATCH] fetch: use "quick" has_sha1_file for tag following Jeff King
2016-10-13 17:04         ` Jeff King [this message]
2016-10-13 20:06           ` Jeff King
2016-10-14 17:39             ` Junio C Hamano
2016-10-14 18:59               ` Jeff King
2016-10-17 17:30                 ` Junio C Hamano
2016-10-18 10:28                   ` Jeff King
2016-10-13 18:18       ` Huge performance bottleneck reading packs Vegard Nossum
2016-10-13 20:43         ` Jeff King
2016-10-14  6:55           ` Vegard Nossum
2016-10-14 19:00             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161013170443.43slna3zvcvrse5r@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=quentin.casasnovas@oracle.com \
    --cc=spearce@spearce.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).