git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>, git@vger.kernel.org
Subject: Re: [PATCH 0/2] optimizing pack access on "read only" fetch repos
Date: Tue, 29 Jan 2013 08:25:39 -0700	[thread overview]
Message-ID: <2c88f15d-405b-4dd1-8546-94f54f0ac565@email.android.com> (raw)
In-Reply-To: <20130129082939.GB6396@sigill.intra.peff.net>



Jeff King <peff@peff.net> wrote:

>On Sat, Jan 26, 2013 at 10:32:42PM -0800, Junio C Hamano wrote:
>
>> Both makes sense to me.
>> 
>> I also wonder if we would be helped by another "repack" mode that
>> coalesces small packs into a single one with minimum overhead, and
>> run that often from "gc --auto", so that we do not end up having to
>> have 50 packfiles.
>> 
>> When we have 2 or more small and young packs, we could:
>> 
>>  - iterate over idx files for these packs to enumerate the objects
>>    to be packed, replacing read_object_list_from_stdin() step;
>> 
>>  - always choose to copy the data we have in these existing packs,
>>    instead of doing a full prepare_pack(); and
>> 
>>  - use the order the objects appear in the original packs, bypassing
>>    compute_write_order().
>
>I'm not sure. If I understand you correctly, it would basically just be
>concatenating packs without trying to do delta compression between the
>objects which are ending up in the same pack. So it would save us from
>having to do (up to) 50 binary searches to find an object in a pack,
>but
>would not actually save us much space.
>
>I would be interested to see the timing on how quick it is compared to
>a
>real repack, as the I/O that happens during a repack is non-trivial
>(although if you are leaving aside the big "main" pack, then it is
>probably not bad).
>
>But how do these somewhat mediocre concatenated packs get turned into
>real packs? Pack-objects does not consider deltas between objects in
>the
>same pack. And when would you decide to make a real pack? How do you
>know you have 50 young and small packs, and not 50 mediocre coalesced
>packs?


If we are reconsidering repacking strategies, I would like to propose an approach that might be a more general improvement to repacking which would help in more situations. 

You could roll together any packs which are close in size, say within 50% of each other.  With this strategy you will end up with files which are spread out by size exponentially.  I implementated this strategy on top of the current gc script using keep files, it works fairly well:

https://gerrit-review.googlesource.com/#/c/35215/3/contrib/git-exproll.sh

This saves some time, but mostly it saves I/O when repacking regularly.  I suspect that if this strategy were used in core git that further optimizations could be made to also reduce the repack time, but I don't know enough about repacking to know?  We run it nightly on our servers, both write and read only mirrors. We us are a ratio of 5 currently to drastically reduce large repack file rollovers,

-Martin

  reply	other threads:[~2013-01-29 15:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26 22:40 [PATCH 0/2] optimizing pack access on "read only" fetch repos Jeff King
2013-01-26 22:40 ` [PATCH 1/2] fetch: run gc --auto after fetching Jeff King
2013-01-27  1:51   ` Jonathan Nieder
     [not found]   ` <87bmopzbqx.fsf@gmail.com>
2017-07-12 20:00     ` git gc --auto aquires *.lock files that make a subsequent git-fetch error out Jeff King
2017-07-12 20:30       ` Ævar Arnfjörð Bjarmason
2017-07-12 20:43         ` Jeff King
2013-01-26 22:40 ` [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory Jeff King
2013-01-27 10:27   ` Jonathan Nieder
2013-01-27 20:09     ` Junio C Hamano
2013-01-27 23:20       ` Jonathan Nieder
2013-01-27  6:32 ` [PATCH 0/2] optimizing pack access on "read only" fetch repos Junio C Hamano
2013-01-29  8:06   ` Shawn Pearce
2013-01-29  8:29   ` Jeff King
2013-01-29 15:25     ` Martin Fick [this message]
2013-01-29 15:58     ` Junio C Hamano
2013-01-29 21:19       ` Jeff King
2013-01-29 22:26         ` Junio C Hamano
2013-01-31 16:47         ` Shawn Pearce
2013-02-01  9:14           ` Jeff King
2013-02-02 10:07             ` Shawn Pearce
2013-01-29 11:01   ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c88f15d-405b-4dd1-8546-94f54f0ac565@email.android.com \
    --to=mfick@codeaurora.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).