git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: "Vicent Martí" <tanoku@gmail.com>,
	"Colby Ranger" <cranger@google.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Date: Wed, 26 Jun 2013 01:11:17 -0400	[thread overview]
Message-ID: <20130626051117.GB26755@sigill.intra.peff.net> (raw)
In-Reply-To: <CAFFjANRwBBcORhu4mwjESBfr4GJ3zDrgYvUhY=VxK9abv7k2MA@mail.gmail.com>

On Tue, Jun 25, 2013 at 09:33:11PM +0200, Vicent Martí wrote:

> > One way we side-stepped the size inflation problem in JGit was to only
> > use the bitmap index information when sending data on the wire to a
> > client. Here delta reuse plays a significant factor in building the
> > pack, and we don't have to be as accurate on matching deltas. During
> > the equivalent of `git repack` bitmaps are not used, allowing the
> > traditional graph enumeration algorithm to generate path hash
> > information.
> 
> OH BOY HERE WE GO. This is worth its own thread, lots to discuss here.
> I think peff will have a patchset regarding this to upstream soon,
> we'll get back to it later.

We do the same thing (only use bitmaps during on-the-wire fetches).  But
there a few problems with assuming delta reuse.

For us (GitHub), the foremost one is that we pack many "forks" of a
repository together into a single packfile. That means when you clone
torvalds/linux, an object you want may be stored in the on-disk pack
with a delta against an object that you are not going to get. So we have
to throw out that delta and find a new one.

I'm dealing with that by adding an option to respect "islands" during
packing, where an island is a set of common objects (we split it by
fork, since we expect those objects to be fetched together, but you
could use other criteria). The rule is that an object cannot delta
against another object that is not in all of its islands. So everybody
can delta against shared history, but objects in your fork can only
delta against other objects in the fork.  You are guaranteed to be able
to reuse such deltas during a full clone of a fork, and the on-disk pack
size does not suffer all that much (because there is usually a good
alternate delta base within your reachable history).

So with that series, we can get good reuse for clones. But there are
still two cases worth considering:

  1. When you fetch a subset of the commits, git marks only the edges as
     preferred bases, and does not walk the full object graph down to
     the roots. So any object you want that is delta'd against something
     older will not get reused. If you have reachability bitmaps, I
     don't think there is any reason that we cannot use the entire
     object graph (starting at the "have" tips, of course) as preferred
     bases.

  2. The server is not necessarily fully packed. In an active repo, you
     may have a large "base" pack with bitmaps, with several recently
     pushed packs on top. You still need to delta the recently pushed
     objects against the base objects.

I don't have measurements on how much the deltas suffer in those two
cases. I know they suffered quite badly for clones without the name
hashes in our alternates repos, but that part should go away with my
patch series.

-Peff

  parent reply	other threads:[~2013-06-26  5:11 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 23:22 [PATCH 00/16] Speed up Counting Objects with bitmap data Vicent Marti
2013-06-24 23:22 ` [PATCH 01/16] list-objects: mark tree as unparsed when we free its buffer Vicent Marti
2013-06-24 23:22 ` [PATCH 02/16] sha1_file: refactor into `find_pack_object_pos` Vicent Marti
2013-06-25 13:59   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 03/16] pack-objects: use a faster hash table Vicent Marti
2013-06-25 14:03   ` Thomas Rast
2013-06-26  2:14     ` Jeff King
2013-06-26  4:47       ` Jeff King
2013-06-25 17:58   ` Ramkumar Ramachandra
2013-06-25 22:48   ` Junio C Hamano
2013-06-25 23:09     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 04/16] pack-objects: make `pack_name_hash` global Vicent Marti
2013-06-24 23:23 ` [PATCH 05/16] revision: allow setting custom limiter function Vicent Marti
2013-06-24 23:23 ` [PATCH 06/16] sha1_file: export `git_open_noatime` Vicent Marti
2013-06-24 23:23 ` [PATCH 07/16] compat: add endinanness helpers Vicent Marti
2013-06-25 13:08   ` Peter Krefting
2013-06-25 13:25     ` Vicent Martí
2013-06-27  5:56       ` Peter Krefting
2013-06-24 23:23 ` [PATCH 08/16] ewah: compressed bitmap implementation Vicent Marti
2013-06-25  1:10   ` Junio C Hamano
2013-06-25 22:51     ` Junio C Hamano
2013-06-25 15:38   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 09/16] documentation: add documentation for the bitmap format Vicent Marti
2013-06-25  5:42   ` Shawn Pearce
2013-06-25 19:33     ` Vicent Martí
2013-06-25 21:17       ` Junio C Hamano
2013-06-25 22:08         ` Vicent Martí
2013-06-27  1:11           ` Shawn Pearce
2013-06-27  2:36             ` Vicent Martí
2013-06-27  2:45               ` Jeff King
2013-06-27 16:07                 ` Shawn Pearce
2013-06-27 17:17                   ` Jeff King
2013-07-01 18:47                   ` Colby Ranger
2013-07-01 19:13                     ` Shawn Pearce
2013-07-07  9:46                     ` Jeff King
2013-07-07 17:27                       ` Shawn Pearce
2013-06-26  5:11       ` Jeff King [this message]
2013-06-26 18:41         ` Colby Ranger
2013-06-26 22:33           ` Colby Ranger
2013-06-27  0:53             ` Colby Ranger
2013-06-27  1:32               ` Shawn Pearce
2013-06-27  1:29         ` Shawn Pearce
2013-06-25 15:58   ` Thomas Rast
2013-06-25 22:30     ` Vicent Martí
2013-06-26 23:12       ` Thomas Rast
2013-06-26 23:19         ` Thomas Rast
2013-06-24 23:23 ` [PATCH 10/16] pack-objects: use bitmaps when packing objects Vicent Marti
2013-06-25 12:48   ` Ramkumar Ramachandra
2013-06-25 15:58   ` Thomas Rast
2013-06-25 23:06   ` Junio C Hamano
2013-06-25 23:14     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 11/16] rev-list: add bitmap mode to speed up lists Vicent Marti
2013-06-25 16:22   ` Thomas Rast
2013-06-26  1:45     ` Vicent Martí
2013-06-26 23:13       ` Thomas Rast
2013-06-26  5:22     ` Jeff King
2013-06-24 23:23 ` [PATCH 12/16] pack-objects: implement bitmap writing Vicent Marti
2013-06-24 23:23 ` [PATCH 13/16] repack: consider bitmaps when performing repacks Vicent Marti
2013-06-25 23:00   ` Junio C Hamano
2013-06-25 23:16     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 14/16] sha1_file: implement `nth_packed_object_info` Vicent Marti
2013-06-24 23:23 ` [PATCH 15/16] write-bitmap: implement new git command to write bitmaps Vicent Marti
2013-06-24 23:23 ` [PATCH 16/16] rev-list: Optimize --count using bitmaps too Vicent Marti
2013-06-25 16:05 ` [PATCH 00/16] Speed up Counting Objects with bitmap data Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130626051117.GB26755@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    --cc=tanoku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).