git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Doug Kelly <dougk.ff7@gmail.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: Question: .idx without .pack causes performance issues?
Date: Tue, 21 Jul 2015 11:57:48 -0700	[thread overview]
Message-ID: <xmqq4mkxwd77.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <CAEtYS8QWCg5_DtrJw-e+c50vcG0OpciR6LWon-3GgyngGn+0pQ@mail.gmail.com> (Doug Kelly's message of "Tue, 21 Jul 2015 13:41:58 -0500")

Doug Kelly <dougk.ff7@gmail.com> writes:

> I just wanted to relay an issue we've seen before at my day job (and
> it just recently cropped up again).  When moving users from Git for
> Windows 1.8.3 to 1.9.5, we found a few users started having operations
> take an excruciatingly long amount of time.  At some point, we traced
> the issue to a number of .pack files had been deleted (possibly
> garbage collected?) -- but their associated .idx files were still
> present.  Upon removing the "orphaned" idx files, we found performance
> returned to normal.  Otherwise, git fsck reported no issues with the
> repositories.
>
> Other users have noted that using git gc would sometimes correct the
> issue for them, but not always.
>
> Anyway, has anyone else experienced this performance degradation?

I wouldn't be surprised if such a configuration to have leftover
".idx" files that lack ".pack" affected performance, but I think you
really have to work on getting into such a situation (unless your
operating system is very cooperative and tries hard to corrupt your
repository, that is ;-), so I wouldn't be surprised if you were the
first one to report this.

We open the ".idx" file and try to keep as many of them in-core,
without opening corresponding ".pack" until the data is needed. 

When we need an object, we learn from an ".idx" file that a
particular pack ought to have a copy of it, and then attempt to open
the corresponding ".pack" file.  If this fails, we do protect
ourselves from strange repositories with only ".idx" files by not
using that ".idx" and try to see if the sought-after object exists
elsewhere (and if there isn't we say "no such object", which is also
a correct thing to do).

I however do not think that we mark the in-core structure that
corresponds to an open ".idx" file in any way when such a failure
happens.  If we really cared enough, we could do so, saying "we know
there is .idx file, but do not bother looking at it again, as we
know the corresponding .pack is missing", and that would speed things
up a bit, essentially bringing us back to a sane situation without
any ".idx" without corresponding ".pack".

I do not think it is worth the effort, though.  It would be more
fruitful to find out how you end up with ".idx exists but not
corresponding .pack" and if that is some systemic failure, see if
there is a way to prevent that from happening in the first place.

Also, I think it may not be a bad idea to teach "gc" to remove stale
".idx" files that do not have corresponding ".pack" as garbage.

  reply	other threads:[~2015-07-21 18:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-21 18:41 Question: .idx without .pack causes performance issues? Doug Kelly
2015-07-21 18:57 ` Junio C Hamano [this message]
2015-07-21 19:15   ` Junio C Hamano
2015-07-21 20:48     ` Junio C Hamano
2015-07-21 21:37       ` Doug Kelly
2015-08-03 22:17         ` Doug Kelly
2015-08-04  1:27           ` Junio C Hamano
2015-08-07 21:36             ` Doug Kelly
2015-08-07 22:27               ` Junio C Hamano
2015-08-13 18:02                 ` [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory Doug Kelly
2015-08-13 18:02                   ` [PATCH 2/2] gc: Remove garbage .idx files from pack dir Doug Kelly
2015-08-17 16:35                     ` Junio C Hamano
2015-08-17 20:30                     ` Junio C Hamano
2015-08-13 18:46                   ` [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory Eric Sunshine
2015-08-17 16:53                     ` Junio C Hamano
2015-10-28 17:48                       ` Junio C Hamano
2015-10-28 22:43                         ` Doug Kelly
2015-11-04  3:05                           ` [PATCH 1/3] " Doug Kelly
2015-11-04  3:05                             ` [PATCH 2/3] t5304: Add test for cleaning pack garbage Doug Kelly
2015-11-04  3:05                             ` [PATCH 3/3] gc: Remove garbage .idx files from pack dir Doug Kelly
2015-11-04  3:12                           ` [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory Doug Kelly
2015-11-04 19:35                             ` Junio C Hamano
2015-11-04 19:56                               ` Doug Kelly
2015-11-04 20:02                                 ` Jeff King
2015-11-04 20:08                                   ` Doug Kelly
2015-11-04 20:15                                     ` Jeff King
2015-12-30  7:37                                     ` Jeff King
2016-01-13 17:14                                       ` Doug Kelly
2016-01-13 20:08                                         ` Junio C Hamano
2016-01-13 20:19                                           ` Doug Kelly
2016-01-13 20:23                                             ` Jeff King
2015-11-04 19:56                               ` Jeff King
     [not found]     ` <CABYiQpn7r2Vcf=S5RaWHBN85eBYGPV_e02+BY=4L98qfUzDT1Q@mail.gmail.com>
2015-11-11 14:58       ` Fwd: Question: .idx without .pack causes performance issues? Thomas Berg
2015-07-21 19:49   ` Doug Kelly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq4mkxwd77.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=dougk.ff7@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).