git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Vicent Marti <vicent@github.com>
Subject: [PATCH 0/19] pack bitmaps
Date: Thu, 24 Oct 2013 13:59:15 -0400	[thread overview]
Message-ID: <20131024175915.GA23398@sigill.intra.peff.net> (raw)

This series implements JGit-style pack bitmaps to speed up fetching and
cloning. For example, here is a simulation of the server side of a clone
of a fully-packed kernel repo (measuring actual clones is harder,
because the client does a lot of work on resolving deltas):

   [before]
   $ time git pack-objects --all --stdout </dev/null >/dev/null
   Counting objects: 3237103, done.
   Compressing objects: 100% (508752/508752), done.
   Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

   real    0m44.111s
   user    0m42.396s
   sys     0m3.544s


   [after]
   $ time git pack-objects --all --stdout </dev/null >/dev/null
   Reusing existing pack: 3237103, done.
   Total 3237103 (delta 0), reused 0 (delta 0)

   real    0m1.636s
   user    0m1.460s
   sys     0m0.172s


This helps eliminate load on the server side, but it also means that we
actually start transferring objects way faster, which means the clones
finish faster. If you look at current clones of torvalds/linux from
kernel.org, it's almost two minutes before they actually start sending
you any data, during which time the client is twiddling its thumbs.

The bitmaps implemented here are compatible with those produced by JGit.
We can read JGit-produced bitmaps, and JGit can read ours. The one
exception is the final patch, which adds an optional name-hash cache.
It's added in such a way that existing implementations can ignore it,
and is marked with a flag in the header. However, JGit is very picky
about the "flags" field; it will reject any bitmap index with a flag it
does not know about.

The patches are:

  [01/19]: sha1write: make buffer const-correct
  [02/19]: revindex: Export new APIs
  [03/19]: pack-objects: Refactor the packing list
  [04/19]: pack-objects: factor out name_hash
  [05/19]: revision: allow setting custom limiter function
  [06/19]: sha1_file: export `git_open_noatime`
  [07/19]: compat: add endianness helpers
  [08/19]: ewah: compressed bitmap implementation

    Refactoring and support for the rest of the series.

  [09/19]: documentation: add documentation for the bitmap format
  [10/19]: pack-bitmap: add support for bitmap indexes
  [11/19]: pack-objects: use bitmaps when packing objects
  [12/19]: rev-list: add bitmap mode to speed up object lists

    Bitmap reading (you can test it against JGit at this point by
    running "jgit debug-gc", and then cloning or running rev-list).

  [13/19]: pack-objects: implement bitmap writing
  [14/19]: repack: stop using magic number for ARRAY_SIZE(exts)
  [15/19]: repack: turn exts array into array-of-struct
  [16/19]: repack: handle optional files created by pack-objects
  [17/19]: repack: consider bitmaps when performing repacks

    Bitmap writing (you can test against JGit by running
    "git repack -adb", and then running "jgit daemon" to
    serve the result).

  [18/19]: t: add basic bitmap functionality tests

    With reading and writing, we can do our own tests.

  [19/19]: pack-bitmap: implement optional name_hash cache

    And this is our extension.

A similar series has been running on github.com for the past couple of
months, though not every repository has had bitmaps turned on (but some
very busy ones have).  We've hopefully squeezed out all of the bugs and
corner cases over that time. However, I did rebase this on a more modern
version of "master"; among other conflicts, this required porting the
git-repack changes from shell to C. So it's entirely possible I've
introduced new bugs. :)

The idea and original implementation for bitmaps comes from Shawn and
Colby, of course. The hard work in this series was done by Vicent Marti,
and he is credited as the author in most of the patches. I've added some
window dressing and helped a little with debugging and review. But along
with Vicent, I should be able to help with answering questions for
review, and as time goes on, I'm familiar enough with the code to deal
with bugs and reviewing future changes.

-Peff

             reply	other threads:[~2013-10-24 17:59 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-24 17:59 Jeff King [this message]
2013-10-24 17:59 ` [PATCH 01/19] sha1write: make buffer const-correct Jeff King
2013-10-24 18:00 ` [PATCH 02/19] revindex: Export new APIs Jeff King
2013-10-24 18:01 ` [PATCH 03/19] pack-objects: Refactor the packing list Jeff King
2013-10-24 18:01 ` [PATCH 04/19] pack-objects: factor out name_hash Jeff King
2013-10-24 18:01 ` [PATCH 05/19] revision: allow setting custom limiter function Jeff King
2013-10-24 18:01 ` [PATCH 06/19] sha1_file: export `git_open_noatime` Jeff King
2013-10-24 18:01 ` [PATCH 07/19] compat: add endianness helpers Jeff King
2013-10-26  7:55   ` Thomas Rast
2013-10-30  8:25     ` Jeff King
2013-10-30 17:06       ` Vicent Martí
2013-10-24 18:02 ` [PATCH 08/19] ewah: compressed bitmap implementation Jeff King
2013-10-24 23:34   ` Junio C Hamano
2013-10-25  3:15     ` Jeff King
2013-10-26  7:55   ` Thomas Rast
2013-10-24 18:03 ` [PATCH 09/19] documentation: add documentation for the bitmap format Jeff King
2013-10-25  1:16   ` Duy Nguyen
2013-10-25  3:21     ` Jeff King
2013-10-25  3:28       ` Duy Nguyen
2013-10-25 13:47       ` Shawn Pearce
2013-10-30  7:50         ` Jeff King
2013-10-30 10:23           ` Shawn Pearce
2013-10-30 16:11             ` Vicent Marti
2013-10-30 16:14             ` Vicent Marti
2013-10-24 18:03 ` [PATCH 10/19] pack-bitmap: add support for bitmap indexes Jeff King
2013-10-25 13:55   ` Shawn Pearce
2013-10-30  8:10     ` Jeff King
2013-10-30 10:27       ` Shawn Pearce
2013-10-30 15:47       ` Vicent Marti
2013-10-30 16:04         ` Shawn Pearce
2013-10-30 20:25         ` Jeff King
2013-10-24 18:04 ` [PATCH 11/19] pack-objects: use bitmaps when packing objects Jeff King
2013-10-25 14:14   ` Shawn Pearce
2013-10-30  8:21     ` Jeff King
2013-10-30 10:38       ` Shawn Pearce
2013-10-30 16:01         ` Vicent Marti
2013-10-24 18:06 ` [PATCH 12/19] rev-list: add bitmap mode to speed up object lists Jeff King
2013-10-25 14:00   ` Shawn Pearce
2013-10-30  8:12     ` Jeff King
2013-10-24 18:06 ` [PATCH 13/19] pack-objects: implement bitmap writing Jeff King
2013-10-25  1:21   ` Duy Nguyen
2013-10-25  3:22     ` Jeff King
2013-10-24 18:06 ` [PATCH 14/19] repack: stop using magic number for ARRAY_SIZE(exts) Jeff King
2013-10-24 18:07 ` [PATCH 15/19] repack: turn exts array into array-of-struct Jeff King
2013-10-24 18:07 ` [PATCH 16/19] repack: handle optional files created by pack-objects Jeff King
2013-10-24 18:08 ` [PATCH 17/19] repack: consider bitmaps when performing repacks Jeff King
2013-10-24 18:08 ` [PATCH 18/19] t: add basic bitmap functionality tests Jeff King
2013-10-24 18:08 ` [PATCH 19/19] pack-bitmap: implement optional name_hash cache Jeff King
2013-10-24 20:25 ` [PATCH 0/19] pack bitmaps Junio C Hamano
2013-10-25  3:07 ` Junio C Hamano
2013-10-25  5:55 ` [PATCHv2 " Jeff King
2013-10-25  5:57   ` [PATCH v2 01/19] sha1write: make buffer const-correct Jeff King
2013-10-25  6:02   ` [PATCH 02/19] revindex: Export new APIs Jeff King
2013-10-25  6:03   ` [PATCH v2 03/19] pack-objects: Refactor the packing list Jeff King
2013-10-25  6:03   ` [PATCH v2 04/19] pack-objects: factor out name_hash Jeff King
2013-10-25  6:03   ` [PATCH v2 05/19] revision: allow setting custom limiter function Jeff King
2013-10-25  6:03   ` [PATCH v2 06/19] sha1_file: export `git_open_noatime` Jeff King
2013-10-25  6:03   ` [PATCH v2 07/19] compat: add endianness helpers Jeff King
2013-10-25  6:03   ` [PATCH v2 08/19] ewah: compressed bitmap implementation Jeff King
2013-10-25  6:03   ` [PATCH v2 09/19] documentation: add documentation for the bitmap format Jeff King
2013-10-25  6:03   ` [PATCH v2 10/19] pack-bitmap: add support for bitmap indexes Jeff King
2013-10-25 23:06     ` Junio C Hamano
2013-10-26  0:26       ` Jeff King
2013-10-26  6:25         ` Jeff King
2013-10-28 15:22           ` Junio C Hamano
2013-10-30  7:00             ` Jeff King
2013-10-26 10:14     ` Duy Nguyen
2013-10-30  7:27       ` Jeff King
2013-10-25  6:03   ` [PATCH v2 11/19] pack-objects: use bitmaps when packing objects Jeff King
2013-10-26 10:25     ` Duy Nguyen
2013-10-30  7:36       ` Jeff King
2013-10-30 10:28         ` Duy Nguyen
2013-10-30 20:07           ` Jeff King
2013-10-31 12:03             ` Duy Nguyen
2013-10-25  6:04   ` [PATCH v2 12/19] rev-list: add bitmap mode to speed up object lists Jeff King
2013-10-25  6:04   ` [PATCH v2 13/19] pack-objects: implement bitmap writing Jeff King
2013-10-25  6:04   ` [PATCH v2 14/19] repack: stop using magic number for ARRAY_SIZE(exts) Jeff King
2013-10-25  6:04   ` [PATCH v2 15/19] repack: turn exts array into array-of-struct Jeff King
2013-10-25  6:04   ` [PATCH v2 16/19] repack: handle optional files created by pack-objects Jeff King
2013-10-25  6:04   ` [PATCH v2 17/19] repack: consider bitmaps when performing repacks Jeff King
2013-10-25  6:04   ` [PATCH v2 18/19] t: add basic bitmap functionality tests Jeff King
2013-10-28 22:13     ` SZEDER Gábor
2013-10-30  7:39       ` Jeff King
2013-10-25  6:04   ` [PATCH v2 19/19] pack-bitmap: implement optional name_hash cache Jeff King
2013-10-26 10:19     ` [PATCH 20/19] count-objects: consider .bitmap without .pack/.idx pair garbage Nguyễn Thái Ngọc Duy
2013-10-30  6:59       ` Jeff King
2013-10-30 17:36         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131024175915.GA23398@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=vicent@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).