git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Jeff King <peff@peff.net>
Cc: git <git@vger.kernel.org>, Vicent Marti <vicent@github.com>
Subject: Re: [PATCH 11/19] pack-objects: use bitmaps when packing objects
Date: Wed, 30 Oct 2013 10:38:57 +0000	[thread overview]
Message-ID: <CAJo=hJs0uUdDfdo9g-FeUmed5Z4+S+spPb+4OL8NipN-GXxuxQ@mail.gmail.com> (raw)
In-Reply-To: <20131030082143.GM11317@sigill.intra.peff.net>

On Wed, Oct 30, 2013 at 8:21 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Oct 25, 2013 at 02:14:11PM +0000, Shawn O. Pearce wrote:
>> On Thu, Oct 24, 2013 at 6:04 PM, Jeff King <peff@peff.net> wrote:
>> > For bitmaps to be used, the following must be true:
>> >
>> >   1. We must be packing to stdout (as a normal `pack-objects` from
>> >      `upload-pack` would do).
>> >
>> >   2. There must be a .bitmap index containing at least one of the
>> >      "have" objects that the client is asking for.
>>
>> The client must explicitly "have" a commit that has a bitmap? In JGit
>> we allow the client to have anything, and walk backwards using
>> traditional graph traversal until a bitmap is found.
>
> If the bitmaps contain the full set of reachable objects and the client
> does not have any "haves" that are bitmapped , then we know that either:
>
>   1. Their "haves" are not reachable from the "wants"
>
>      or
>
>   2. Their "wants" are not bitmapped, and so the slice of "haves..wants"
>      has no bitmaps
>
> Since (1) is relatively rare, I think we are using this as a proxy for
> (2), so that we can do a regular walk rather than looking around for
> bitmaps that don't exist. But I may be misremembering the reasoning.
> Vicent?

Ah. I am not sure if we do this in JGit. I think JGit's approach is to
look if the have appears in a pack with bitmaps, this is a simple
lookup in the .idx file and does not require expanding any data from
the .bitmap file.


But it wasn't my question. :-)

Client sends "want B ; have E". What if E appears in the bitmapped
pack, but does not itself have a bitmap? Do you walk backwards from B
and switch to the bitmap algorithm when you find a commit that has a
bitmap and that bitmap contains E?

>> In JGit we write the to_pack list first, then the reuse pack. Our
>> rationale was the to_pack list is recent objects that are newer and
>> would appear first in a traditional traversal, so they should go at
>> the front of the stream. This does mean if they delta compress against
>> an object in that reuse_packfile slice they have to use REF_DELTA
>> instead of OFS_DELTA.
>
> That's a good point. In our case I think we do not delta against the
> reused packfile objects at all, as we simply send out the whole slice of
> packfile without making an entry for each object.

JGit also doesn't use the reused packfile as delta bases, because
there are no object entries to shove through the delta window. So
there is never any risk of a reference to a base later in the file. It
also means that "thin pack" at the front of the stream is less
optimally compressed. At Google we side step that by doing GC at the
server very often, to try and keep the number of objects in that pack
low.

It might make sense to use a commit that covers the majority of the
reused pack as the edge base candidate case during delta compression
here, as though the client had sent us a "have" for that commit. I
don't think we have tried this in JGit. It would make deltas use
REF_DELTA, but the delta size has to be smaller than the REF_DELTA
header to be used in the stream so its still a smaller overall
transfer.

>> Is this series running on github.com/torvalds/linux? Last Saturday I
>> ran a live demo clone comparing github.com/torvalds/linux to a JGit
>> bitmap clone and some guy heckled me because GitHub was only a few
>> seconds slower. :-)
>
> It is. Use kernel.org if you want to make fun of someone. :)

Hah. OK, so GitHub was only a few seconds slower only because my
desktop is better connected to our data centers than to yours. Nicely
done, this patch series really works. :)

  reply	other threads:[~2013-10-30 10:39 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-24 17:59 [PATCH 0/19] pack bitmaps Jeff King
2013-10-24 17:59 ` [PATCH 01/19] sha1write: make buffer const-correct Jeff King
2013-10-24 18:00 ` [PATCH 02/19] revindex: Export new APIs Jeff King
2013-10-24 18:01 ` [PATCH 03/19] pack-objects: Refactor the packing list Jeff King
2013-10-24 18:01 ` [PATCH 04/19] pack-objects: factor out name_hash Jeff King
2013-10-24 18:01 ` [PATCH 05/19] revision: allow setting custom limiter function Jeff King
2013-10-24 18:01 ` [PATCH 06/19] sha1_file: export `git_open_noatime` Jeff King
2013-10-24 18:01 ` [PATCH 07/19] compat: add endianness helpers Jeff King
2013-10-26  7:55   ` Thomas Rast
2013-10-30  8:25     ` Jeff King
2013-10-30 17:06       ` Vicent Martí
2013-10-24 18:02 ` [PATCH 08/19] ewah: compressed bitmap implementation Jeff King
2013-10-24 23:34   ` Junio C Hamano
2013-10-25  3:15     ` Jeff King
2013-10-26  7:55   ` Thomas Rast
2013-10-24 18:03 ` [PATCH 09/19] documentation: add documentation for the bitmap format Jeff King
2013-10-25  1:16   ` Duy Nguyen
2013-10-25  3:21     ` Jeff King
2013-10-25  3:28       ` Duy Nguyen
2013-10-25 13:47       ` Shawn Pearce
2013-10-30  7:50         ` Jeff King
2013-10-30 10:23           ` Shawn Pearce
2013-10-30 16:11             ` Vicent Marti
2013-10-30 16:14             ` Vicent Marti
2013-10-24 18:03 ` [PATCH 10/19] pack-bitmap: add support for bitmap indexes Jeff King
2013-10-25 13:55   ` Shawn Pearce
2013-10-30  8:10     ` Jeff King
2013-10-30 10:27       ` Shawn Pearce
2013-10-30 15:47       ` Vicent Marti
2013-10-30 16:04         ` Shawn Pearce
2013-10-30 20:25         ` Jeff King
2013-10-24 18:04 ` [PATCH 11/19] pack-objects: use bitmaps when packing objects Jeff King
2013-10-25 14:14   ` Shawn Pearce
2013-10-30  8:21     ` Jeff King
2013-10-30 10:38       ` Shawn Pearce [this message]
2013-10-30 16:01         ` Vicent Marti
2013-10-24 18:06 ` [PATCH 12/19] rev-list: add bitmap mode to speed up object lists Jeff King
2013-10-25 14:00   ` Shawn Pearce
2013-10-30  8:12     ` Jeff King
2013-10-24 18:06 ` [PATCH 13/19] pack-objects: implement bitmap writing Jeff King
2013-10-25  1:21   ` Duy Nguyen
2013-10-25  3:22     ` Jeff King
2013-10-24 18:06 ` [PATCH 14/19] repack: stop using magic number for ARRAY_SIZE(exts) Jeff King
2013-10-24 18:07 ` [PATCH 15/19] repack: turn exts array into array-of-struct Jeff King
2013-10-24 18:07 ` [PATCH 16/19] repack: handle optional files created by pack-objects Jeff King
2013-10-24 18:08 ` [PATCH 17/19] repack: consider bitmaps when performing repacks Jeff King
2013-10-24 18:08 ` [PATCH 18/19] t: add basic bitmap functionality tests Jeff King
2013-10-24 18:08 ` [PATCH 19/19] pack-bitmap: implement optional name_hash cache Jeff King
2013-10-24 20:25 ` [PATCH 0/19] pack bitmaps Junio C Hamano
2013-10-25  3:07 ` Junio C Hamano
2013-10-25  5:55 ` [PATCHv2 " Jeff King
2013-10-25  5:57   ` [PATCH v2 01/19] sha1write: make buffer const-correct Jeff King
2013-10-25  6:02   ` [PATCH 02/19] revindex: Export new APIs Jeff King
2013-10-25  6:03   ` [PATCH v2 03/19] pack-objects: Refactor the packing list Jeff King
2013-10-25  6:03   ` [PATCH v2 04/19] pack-objects: factor out name_hash Jeff King
2013-10-25  6:03   ` [PATCH v2 05/19] revision: allow setting custom limiter function Jeff King
2013-10-25  6:03   ` [PATCH v2 06/19] sha1_file: export `git_open_noatime` Jeff King
2013-10-25  6:03   ` [PATCH v2 07/19] compat: add endianness helpers Jeff King
2013-10-25  6:03   ` [PATCH v2 08/19] ewah: compressed bitmap implementation Jeff King
2013-10-25  6:03   ` [PATCH v2 09/19] documentation: add documentation for the bitmap format Jeff King
2013-10-25  6:03   ` [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes Jeff King
2013-10-25 23:06     ` Junio C Hamano
2013-10-26  0:26       ` Jeff King
2013-10-26  6:25         ` Jeff King
2013-10-28 15:22           ` Junio C Hamano
2013-10-30  7:00             ` Jeff King
2013-10-26 10:14     ` Duy Nguyen
2013-10-30  7:27       ` Jeff King
2013-10-25  6:03   ` [PATCH v2 11/19] pack-objects: use bitmaps when packing objects Jeff King
2013-10-26 10:25     ` Duy Nguyen
2013-10-30  7:36       ` Jeff King
2013-10-30 10:28         ` Duy Nguyen
2013-10-30 20:07           ` Jeff King
2013-10-31 12:03             ` Duy Nguyen
2013-10-25  6:04   ` [PATCH v2 12/19] rev-list: add bitmap mode to speed up object lists Jeff King
2013-10-25  6:04   ` [PATCH v2 13/19] pack-objects: implement bitmap writing Jeff King
2013-10-25  6:04   ` [PATCH v2 14/19] repack: stop using magic number for ARRAY_SIZE(exts) Jeff King
2013-10-25  6:04   ` [PATCH v2 15/19] repack: turn exts array into array-of-struct Jeff King
2013-10-25  6:04   ` [PATCH v2 16/19] repack: handle optional files created by pack-objects Jeff King
2013-10-25  6:04   ` [PATCH v2 17/19] repack: consider bitmaps when performing repacks Jeff King
2013-10-25  6:04   ` [PATCH v2 18/19] t: add basic bitmap functionality tests Jeff King
2013-10-28 22:13     ` SZEDER Gábor
2013-10-30  7:39       ` Jeff King
2013-10-25  6:04   ` [PATCH v2 19/19] pack-bitmap: implement optional name_hash cache Jeff King
2013-10-26 10:19     ` [PATCH 20/19] count-objects: consider .bitmap without .pack/.idx pair garbage Nguyễn Thái Ngọc Duy
2013-10-30  6:59       ` Jeff King
2013-10-30 17:36         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJo=hJs0uUdDfdo9g-FeUmed5Z4+S+spPb+4OL8NipN-GXxuxQ@mail.gmail.com' \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=vicent@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).