git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Vegard Nossum <vegard.nossum@oracle.com>
Cc: git@vger.kernel.org,
	"Quentin Casasnovas" <quentin.casasnovas@oracle.com>,
	"Shawn Pearce" <spearce@spearce.org>, "Jeff King" <peff@peff.net>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: Huge performance bottleneck reading packs
Date: Wed, 12 Oct 2016 15:45:39 -0700	[thread overview]
Message-ID: <xmqqpon5190s.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <ea8db41f-2ea4-b37b-e6f8-1f1d428aea5d@oracle.com> (Vegard Nossum's message of "Thu, 13 Oct 2016 00:30:52 +0200")

Vegard Nossum <vegard.nossum@oracle.com> writes:

> A closer inspection reveals the problem to really be that this is an
> extremely hot path with more than -- holy cow -- 4,106,756,451
> iterations on the 'packed_git' list for a single 'git fetch' on my
> repository. I'm guessing the patch above just made the inner loop
> ever so slightly slower.

Very plausible, and this ...

> My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
> and 4 tmp_pack_* files).

... may explain why nobody else has seen a difference.  

Is there a reason why your repository has that many pack files?  Is
automatic GC not working for some reason?

"gc" would try to make sure that you have reasonably low number of
packs, as having too many packs is detrimental for performance for
multiple reasons, including:

 * All objects in a single pack expressed in delta format (i.e. only
   the difference from another object is stored) must eventually
   have another object that its difference is based on recorded in
   the full format in the same packfile.

 * A single packfile records a single object only once, but it is
   normal (and often required because of the point above) that the
   same object appears in multiple packfiles.

 * Locating of objects from a single packfile uses its .idx file by
   binary search of sorted list of object names, which is efficient,
   but this cost is multiplied linearly as the number of packs you
   have in your repository.


  reply	other threads:[~2016-10-12 22:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 22:30 Huge performance bottleneck reading packs Vegard Nossum
2016-10-12 22:45 ` Junio C Hamano [this message]
2016-10-13  7:17   ` Vegard Nossum
2016-10-13 14:50     ` Jeff King
2016-10-12 23:01 ` Jeff King
2016-10-12 23:18   ` Jeff King
2016-10-12 23:47     ` Jeff King
2016-10-13  9:04       ` Vegard Nossum
2016-10-14  9:35         ` Jakub Narębski
2016-10-13  7:20   ` Vegard Nossum
2016-10-13 15:26     ` Jeff King
2016-10-13 16:53       ` [PATCH] fetch: use "quick" has_sha1_file for tag following Jeff King
2016-10-13 17:04         ` Jeff King
2016-10-13 20:06           ` Jeff King
2016-10-14 17:39             ` Junio C Hamano
2016-10-14 18:59               ` Jeff King
2016-10-17 17:30                 ` Junio C Hamano
2016-10-18 10:28                   ` Jeff King
2016-10-13 18:18       ` Huge performance bottleneck reading packs Vegard Nossum
2016-10-13 20:43         ` Jeff King
2016-10-14  6:55           ` Vegard Nossum
2016-10-14 19:00             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpon5190s.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=quentin.casasnovas@oracle.com \
    --cc=spearce@spearce.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).