git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: "Shawn O. Pearce" <spearce@spearce.org>, git@vger.kernel.org
Subject: Re: [PATCH 0/2] optimizing pack access on "read only" fetch repos
Date: Tue, 29 Jan 2013 07:58:01 -0800	[thread overview]
Message-ID: <7vwquwrng6.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20130129082939.GB6396@sigill.intra.peff.net> (Jeff King's message of "Tue, 29 Jan 2013 03:29:40 -0500")

Jeff King <peff@peff.net> writes:

>> I also wonder if we would be helped by another "repack" mode that
>> coalesces small packs into a single one with minimum overhead, and
>> run that often from "gc --auto", so that we do not end up having to
>> have 50 packfiles.
>> ...
>
> I'm not sure. If I understand you correctly, it would basically just be
> concatenating packs without trying to do delta compression between the
> objects which are ending up in the same pack. So it would save us from
> having to do (up to) 50 binary searches to find an object in a pack, but
> would not actually save us much space.

The point is not about space.  Disk is cheap, and it is not making
it any worse than what happens to your target audience, that is a
fetch-only repository with only "gc --auto" in it, where nobody
passes "-f" to "repack" to cause recomputation of delta.

What I was trying to seek was a way to reduce the runtime penalty we
pay every time we run git in such a repository.

 - Object look-up cost will become log2(50*n) from 50*log2(n), which
   is about 50/log2(50) improvement;

 - System resource cost we incur by having to keep 50 file
   descriptors open and maintaining 50 mmap windows will reduce by
   50 fold.

 - Anything else I missed?

> I would be interested to see the timing on how quick it is compared to a
> real repack,...

Yes, that is what I meant by "wonder if we would be helped by" ;-)

> But how do these somewhat mediocre concatenated packs get turned into
> real packs?

How do they get processed in a fetch-only repositories that
sometimes run "gc --auto" today?  By runninng "repack -a -d -f"
occasionally, perhaps?

At some point, you would need to run a repack that involves a real
object-graph traversal that feeds you the paths for objects to
obtain a reasonably compressed pack.  We can inspect existing packs
and guess a rough traversal order for commits, but for trees and
blobs, you cannot unify existing delta chains from multiple packs
effectively with data in the pack alone.

  parent reply	other threads:[~2013-01-29 15:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26 22:40 [PATCH 0/2] optimizing pack access on "read only" fetch repos Jeff King
2013-01-26 22:40 ` [PATCH 1/2] fetch: run gc --auto after fetching Jeff King
2013-01-27  1:51   ` Jonathan Nieder
     [not found]   ` <87bmopzbqx.fsf@gmail.com>
2017-07-12 20:00     ` git gc --auto aquires *.lock files that make a subsequent git-fetch error out Jeff King
2017-07-12 20:30       ` Ævar Arnfjörð Bjarmason
2017-07-12 20:43         ` Jeff King
2013-01-26 22:40 ` [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory Jeff King
2013-01-27 10:27   ` Jonathan Nieder
2013-01-27 20:09     ` Junio C Hamano
2013-01-27 23:20       ` Jonathan Nieder
2013-01-27  6:32 ` [PATCH 0/2] optimizing pack access on "read only" fetch repos Junio C Hamano
2013-01-29  8:06   ` Shawn Pearce
2013-01-29  8:29   ` Jeff King
2013-01-29 15:25     ` Martin Fick
2013-01-29 15:58     ` Junio C Hamano [this message]
2013-01-29 21:19       ` Jeff King
2013-01-29 22:26         ` Junio C Hamano
2013-01-31 16:47         ` Shawn Pearce
2013-02-01  9:14           ` Jeff King
2013-02-02 10:07             ` Shawn Pearce
2013-01-29 11:01   ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vwquwrng6.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).