git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Shawn Pearce <spearce@spearce.org>,
	steve.norman@thomsonreuters.com, git <git@vger.kernel.org>
Subject: Re: [PATCH] index-pack: avoid excessive re-reading of pack directory
Date: Wed, 10 Jun 2015 21:36:23 +0700	[thread overview]
Message-ID: <CACsJy8AfyYFcDdXN8jyRedhAx9f7c2n9Mmxs2W_FzfYd-f=AEg@mail.gmail.com> (raw)
In-Reply-To: <20150610140024.GA9395@peff.net>

On Wed, Jun 10, 2015 at 9:00 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Jun 09, 2015 at 08:46:24PM -0700, Shawn Pearce wrote:
>
>> > This patch introduces a "quick" flag to has_sha1_file which
>> > callers can use when they would prefer high performance at
>> > the cost of false negatives during repacks. There may be
>> > other code paths that can use this, but the index-pack one
>> > is the most obviously critical, so we'll start with
>> > switching that one.
>>
>> Hilarious. We did this in JGit back in ... uhm before 2009. :)
>>
>> But its Java. So of course we had to do optimizations.
>
> This is actually how Git did it up until v1.8.4.2, in 2013. I changed it
> then because the old way was racy (and git could flakily report refs as
> broken and skip them during repacks!).
>
> If you are doing it the "quick" way everywhere in JGit, you may want to
> reexamine the possibility for races. :)

I was expecting this :D

>> > @@ -3169,6 +3169,8 @@ int has_sha1_file(const unsigned char *sha1)
>> >                 return 1;
>> >         if (has_loose_object(sha1))
>> >                 return 1;
>> > +       if (flags & HAS_SHA1_QUICK)
>> > +               return 0;
>> >         reprepare_packed_git();
>> >         return find_pack_entry(sha1, &e);
>>
>> Something else we do is readdir() over the loose objects and store
>> them in a map in memory. That way we avoid stat() calls during that
>> has_loose_object() path. This is apparently a win enough of the time
>> that we always do that when receiving a pack over the wire (client or
>> server).
>
> Yeah, I thought about that while writing this. It would be a win as long
> as you have a small number of loose objects and were going to make a
> large number of requests (otherwise you are traversing even though
> nobody is going to look it up). According to perf, though, loose object
> lookups are not a large expenditure[1].
>
> I'm also hesitant to go that route because it's basically caching, which
> introduces new opportunities for race conditions when the cache is stale
> (we do the same thing with loose refs, and we have indeed run into races
> there).

Watchman may help avoid races _with_ caching. But we need to support
both ways in that case, falling back to normal file poking when
watchman gives up, or when we're on Windows. Extra work needs big
enough performance gain to justify. I haven't seen that gain yet.

> [1] As measured mostly by __d_lookup_rcu calls. Of course, my patch
>     gives a 5% improvement over the original, and we were not spending
>     5% of the time there originally. I suspect part of the problem is
>     that we do the lookup under a lock, so the longer we spend there,
>     the more contention we have between threads, and the less
>     parallelism. Indeed, I just did a quick repeat of my tests with
>     pack.threads=1, and the size of the improvement shrinks.
-- 
Duy

  reply	other threads:[~2015-06-10 14:37 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 13:13 Troubleshoot clone issue to NFS steve.norman
2015-05-21 14:00 ` Christian Couder
2015-05-21 14:31 ` Duy Nguyen
2015-05-21 14:38   ` Duy Nguyen
2015-05-21 15:53     ` steve.norman
2015-05-22  0:16       ` Duy Nguyen
2015-05-22  7:12         ` Jeff King
2015-05-22  8:35           ` steve.norman
2015-05-22 10:05             ` Duy Nguyen
2015-05-22 14:37               ` Junio C Hamano
2015-05-22 15:02               ` steve.norman
2015-05-22 23:51                 ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Jeff King
2015-05-22 23:51                   ` [PATCH 1/3] stat_validity: handle non-regular files Jeff King
2015-05-23 11:00                     ` Michael Haggerty
2015-05-24  8:29                       ` Jeff King
2015-05-22 23:52                   ` [PATCH 2/3] cache.h: move stat_validity definition up Jeff King
2015-05-22 23:54                   ` [PATCH 3/3] prepare_packed_git: use stat_validity to avoid re-reading packs Jeff King
2015-05-23  1:19                   ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Duy Nguyen
2015-05-23  1:21                     ` Duy Nguyen
2015-05-24  8:20                     ` Jeff King
2015-05-24  9:00           ` Troubleshoot clone issue to NFS Duy Nguyen
2015-06-05 12:01             ` steve.norman
2015-06-05 12:18               ` Jeff King
2015-06-05 12:29                 ` [PATCH] index-pack: avoid excessive re-reading of pack directory Jeff King
2015-06-09 17:24                   ` Jeff King
2015-06-09 17:41                     ` Jeff King
2015-06-10  3:46                   ` Shawn Pearce
2015-06-10 14:00                     ` Jeff King
2015-06-10 14:36                       ` Duy Nguyen [this message]
2015-06-10 21:34                       ` Shawn Pearce
2015-06-05 14:20                 ` Troubleshoot clone issue to NFS steve.norman
2015-06-16 20:50                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8AfyYFcDdXN8jyRedhAx9f7c2n9Mmxs2W_FzfYd-f=AEg@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    --cc=steve.norman@thomsonreuters.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).