From: Shawn Pearce <spearce@spearce.org>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH 0/2] optimizing pack access on "read only" fetch repos
Date: Thu, 31 Jan 2013 08:47:37 -0800 [thread overview]
Message-ID: <CAJo=hJuGw8x=VrjWhvZhzakuhWrCWr2FRuEsNt5gQNC=6PPuVw@mail.gmail.com> (raw)
In-Reply-To: <20130129211932.GA17377@sigill.intra.peff.net>
On Tue, Jan 29, 2013 at 1:19 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Jan 29, 2013 at 07:58:01AM -0800, Junio C Hamano wrote:
>
>> The point is not about space. Disk is cheap, and it is not making
>> it any worse than what happens to your target audience, that is a
>> fetch-only repository with only "gc --auto" in it, where nobody
>> passes "-f" to "repack" to cause recomputation of delta.
>>
>> What I was trying to seek was a way to reduce the runtime penalty we
>> pay every time we run git in such a repository.
>>
>> - Object look-up cost will become log2(50*n) from 50*log2(n), which
>> is about 50/log2(50) improvement;
>
> Yes and no. Our heuristic is to look at the last-used pack for an
> object. So assuming we have locality of requests, we should quite often
> get "lucky" and find the object in the first log2 search. Even if we
> don't assume locality, a situation with one large pack and a few small
> packs will have the large one as "last used" more often than the others,
> and it will also have the looked-for object more often than the others
Opening all of those files does impact performance. It depends on how
slow your open(2) syscall is. I know on Mac OS X that its not the
fastest function we get from the C library. Performing ~40 opens to
look through the most recent pack files and finally find the "real"
pack that contains that tag you asked `git show` for isn't that quick.
Some of us also use Git on filesystems that are network based, and
slow compared to local disk Linux ext2/3/4 with gobs of free RAM.
> So I can see how it is something we could potentially optimize, but I
> could also see it being surprisingly not a big deal. I'd be very
> interested to see real measurements, even of something as simple as a
> "master index" which can reference multiple packfiles.
I actually tried this many many years ago. There are threads in the
archive about it. Its slower. We ruled it out.
>> - System resource cost we incur by having to keep 50 file
>> descriptors open and maintaining 50 mmap windows will reduce by
>> 50 fold.
>
> I wonder how measurable that is (and if it matters on Linux versus less
> efficient platforms).
It does matter. We know it has a negative impact on JGit even on Linux
for example. You don't want 300 packs in a repository. 50 might be
tolerable. 300 is not.
next prev parent reply other threads:[~2013-01-31 16:48 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-26 22:40 [PATCH 0/2] optimizing pack access on "read only" fetch repos Jeff King
2013-01-26 22:40 ` [PATCH 1/2] fetch: run gc --auto after fetching Jeff King
2013-01-27 1:51 ` Jonathan Nieder
[not found] ` <87bmopzbqx.fsf@gmail.com>
2017-07-12 20:00 ` git gc --auto aquires *.lock files that make a subsequent git-fetch error out Jeff King
2017-07-12 20:30 ` Ævar Arnfjörð Bjarmason
2017-07-12 20:43 ` Jeff King
2013-01-26 22:40 ` [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory Jeff King
2013-01-27 10:27 ` Jonathan Nieder
2013-01-27 20:09 ` Junio C Hamano
2013-01-27 23:20 ` Jonathan Nieder
2013-01-27 6:32 ` [PATCH 0/2] optimizing pack access on "read only" fetch repos Junio C Hamano
2013-01-29 8:06 ` Shawn Pearce
2013-01-29 8:29 ` Jeff King
2013-01-29 15:25 ` Martin Fick
2013-01-29 15:58 ` Junio C Hamano
2013-01-29 21:19 ` Jeff King
2013-01-29 22:26 ` Junio C Hamano
2013-01-31 16:47 ` Shawn Pearce [this message]
2013-02-01 9:14 ` Jeff King
2013-02-02 10:07 ` Shawn Pearce
2013-01-29 11:01 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJo=hJuGw8x=VrjWhvZhzakuhWrCWr2FRuEsNt5gQNC=6PPuVw@mail.gmail.com' \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).