git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: git <git@vger.kernel.org>, Colby Ranger <cranger@google.com>
Subject: Re: Using bitmaps to accelerate fetch and clone
Date: Mon, 1 Oct 2012 08:59:17 +0700	[thread overview]
Message-ID: <CACsJy8D5AXSWAdK7tgtXnE4Ro_+okaYM=zf9JnQfObkcx=FCOw@mail.gmail.com> (raw)
In-Reply-To: <CAJo=hJsWczUqhvj6Kqsomeh9WxAAJO-Yc-=61k94jos6vVtEjQ@mail.gmail.com>

On Mon, Oct 1, 2012 at 8:07 AM, Shawn Pearce <spearce@spearce.org> wrote:
>> You mentioned this before in your idea mail a while back. I wonder if
>> it's worth storing bitmaps for all packs, not just the self contained
>> ones.
>
> Colby and I started talking about this late last week too. It seems
> feasible, but does add a bit more complexity to the algorithm used
> when enumerating.

Yes. Though at server side, if it's too much trouble, the packer can
just ignore open packs and use only closed ones.

>> We could have one leaf bitmap per pack to mark all leaves where
>> we'll need to traverse outside the pack. Commit leaves are the best as
>> we can potentially reuse commit bitmaps from other packs. Tree leaves
>> will be followed in the normal/slow way.
>
> Yes, Colby proposed the same idea.
>
> We cannot make a "leaf bitmap per pack". The leaf SHA-1s are not in
> the pack and therefore cannot have a bit assigned to them.

We could mark all objects _in_ the pack that lead to an external
object. That's what I meant by leaves. We need to parse the leaves to
find out actual SHA-1s that are outside the pack. Or we could go with
your approach below too.

> We could
> add a new section that listed the unique leaf SHA-1s in their own
> private table, and then assigned per bitmap a leaf bitmap that set to
> 1 for any leaf object that is outside of the pack.


> One of the problems we have seen with these non-closed packs is they
> waste an incredible amount of disk. As an example, do a `git fetch`
> from Linus tree when you are more than a few weeks behind. You will
> get back more than 100 objects, so the thin pack will be saved and
> completed with additional base objects. That thin pack will go from a
> few MiBs to more than 40 MiB of data on disk, thanks to the redundant
> base objects being appended to the end of the pack. For most uses
> these packs are best eliminated and replaced with a new complete
> closure pack. The redundant base objects disappear, and Git stops
> wasting a huge amount of disk.

That's probably a different problem. I appreciate disk savings but I
would not want to wait a few more minutes for repack on every
git-fetch. But if this bitmap thing makes repack much faster than
currently, repacking after every git-fetch may become practical.
-- 
Duy

  reply	other threads:[~2012-10-01  2:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-27  0:47 Using bitmaps to accelerate fetch and clone Shawn Pearce
2012-09-27 12:17 ` Nguyen Thai Ngoc Duy
2012-09-27 14:33   ` Shawn Pearce
2012-09-28  1:37     ` Nguyen Thai Ngoc Duy
2012-09-27 17:20   ` Jeff King
2012-09-27 17:35     ` Shawn Pearce
2012-09-27 18:22       ` Jeff King
2012-09-27 18:36         ` Shawn Pearce
2012-09-27 18:52           ` Jeff King
2012-09-27 20:18             ` Jeff King
2012-09-27 21:33               ` Junio C Hamano
2012-09-27 21:36                 ` Jeff King
2012-09-27 19:47     ` David Michael Barr
2012-09-28  1:38     ` Nguyen Thai Ngoc Duy
2012-09-28 12:00 ` Nguyen Thai Ngoc Duy
2012-10-01  1:07   ` Shawn Pearce
2012-10-01  1:59     ` Nguyen Thai Ngoc Duy [this message]
2012-10-01  2:26       ` Shawn Pearce
2012-10-01 12:48         ` Nguyen Thai Ngoc Duy
2012-10-02 15:00           ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8D5AXSWAdK7tgtXnE4Ro_+okaYM=zf9JnQfObkcx=FCOw@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).