git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Vegard Nossum <vegard.nossum@oracle.com>
To: git@vger.kernel.org
Cc: "Quentin Casasnovas" <quentin.casasnovas@oracle.com>,
	"Shawn Pearce" <spearce@spearce.org>, "Jeff King" <peff@peff.net>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Huge performance bottleneck reading packs
Date: Thu, 13 Oct 2016 00:30:52 +0200	[thread overview]
Message-ID: <ea8db41f-2ea4-b37b-e6f8-1f1d428aea5d@oracle.com> (raw)

Hi all,

I've bisected a performance regression (noticed by Quentin and myself)
which caused a 'git fetch' to go from ~1m30s to ~2m40s:

commit 47bf4b0fc52f3ad5823185a85f5f82325787c84b
Author: Jeff King <peff@peff.net>
Date:   Mon Jun 30 13:04:03 2014 -0400

     prepare_packed_git_one: refactor duplicate-pack check

Reverting this commit from a recent mainline master brings the time back
down from ~2m24s to ~1m19s.

The bisect log:

v2.8.1 -- 2m41s, 2m50s (bad)
v1.9.0 -- 1m39s, 1m46s (good)

2.3.4.312.gea1fd48 -- 2m40s
2.1.0.18.gc285171 -- 2m42s
2.0.0.140.g6753d8a -- 1m27s
2.0.1.480.g60e2f5a -- 1m34s
2.0.2.631.gad25da0 -- 2m39s
2.0.1.565.ge0a064a -- 1m30s
2.0.1.622.g2e42338 -- 2m29s
2.0.0.rc1.32.g5165dd5 -- 1m30s
2.0.1.607.g5418212 -- 1m32s
2.0.1.7.g6dda4e6 -- 1m28s
2.0.1.619.g6e40947 -- 2m25s
2.0.1.9.g47bf4b0 -- 2m18s
2.0.1.8.gd6cd00c -- 1m36.542s

However, the commit found by 'git blame' above appears just fine to me,
I haven't been able to spot a bug in it.

A closer inspection reveals the problem to really be that this is an
extremely hot path with more than -- holy cow -- 4,106,756,451
iterations on the 'packed_git' list for a single 'git fetch' on my
repository. I'm guessing the patch above just made the inner loop
ever so slightly slower.

My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
and 4 tmp_pack_* files).

I am convinced that it is not necessary to rescan the entire pack
directory 11,348 times or do all 4 _BILLION_ memcmp() calls for a single
'git fetch', even for a large repository like mine.

I could try to write a patch to reduce the number of times we rescan the
pack directory. However, I've never even looked at the file before
today, so any hints regarding what would need to be done would be
appreciated.

Thanks,

(Cced some people with changes in the area.)


Vegard

             reply	other threads:[~2016-10-12 22:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 22:30 Vegard Nossum [this message]
2016-10-12 22:45 ` Huge performance bottleneck reading packs Junio C Hamano
2016-10-13  7:17   ` Vegard Nossum
2016-10-13 14:50     ` Jeff King
2016-10-12 23:01 ` Jeff King
2016-10-12 23:18   ` Jeff King
2016-10-12 23:47     ` Jeff King
2016-10-13  9:04       ` Vegard Nossum
2016-10-14  9:35         ` Jakub Narębski
2016-10-13  7:20   ` Vegard Nossum
2016-10-13 15:26     ` Jeff King
2016-10-13 16:53       ` [PATCH] fetch: use "quick" has_sha1_file for tag following Jeff King
2016-10-13 17:04         ` Jeff King
2016-10-13 20:06           ` Jeff King
2016-10-14 17:39             ` Junio C Hamano
2016-10-14 18:59               ` Jeff King
2016-10-17 17:30                 ` Junio C Hamano
2016-10-18 10:28                   ` Jeff King
2016-10-13 18:18       ` Huge performance bottleneck reading packs Vegard Nossum
2016-10-13 20:43         ` Jeff King
2016-10-14  6:55           ` Vegard Nossum
2016-10-14 19:00             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea8db41f-2ea4-b37b-e6f8-1f1d428aea5d@oracle.com \
    --to=vegard.nossum@oracle.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=quentin.casasnovas@oracle.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).