git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Vicent Marti <tanoku@gmail.com>
To: git@vger.kernel.org
Cc: Vicent Marti <tanoku@gmail.com>
Subject: [PATCH 00/16] Speed up Counting Objects with bitmap data
Date: Tue, 25 Jun 2013 01:22:57 +0200	[thread overview]
Message-ID: <1372116193-32762-1-git-send-email-tanoku@gmail.com> (raw)

Hello friends and enemies from the lovevely Git Mailing list.

I bring to you a patch series that implement a quite interesting performance
optimization: the removal of the "Counting Objects" phase during `pack-objects`
by using a pre-computed bitmap to find the reachable objects in the packfile.

As you probably know, Shawn Pearce designed this approach a few months ago and
implemented it for JGit, with very exciting results.

This is a not-so-straightforward port of his original design: The general approach
is the same, but unfortunately we were not able to re-use JGit's original on-disk
format for the `.bitmap` files.

There is a full technical spec for the new format (v2) in patch 09, including
benchmarks and rationale for the new design. The gist of it is that JGit's
original format is not `mmap`able (JGit tends to not mmap anything), and that
becomes very costly in practice with `upload-pack`, which spawns a new process
for every upload.

The header and metadata for both formats are however compatible, so it should be
trivial to update JGit to read/write this format too. I intend to do this on the
coming weeks, and I also hope that the v2 implementation will be slightly faster
than the actual, even with the shortcomings of the JVM.

The patch series, although massive, is rather straightforward.

Most of the patches are isolated refactorings that enable access to a few functions
that were previously hidden (re. packfile data). These functions are needed for
reading and writing the bitmap indexes.

Patch 03 is worth noting because it implements a performance optimization for
`pack-objects` which isn't particularly good in normal invocations (~10% speed up)
but that will show great benefits in later patches when it comes to writing the
bitmap indexes.

Patch 10 is the core of the series, implementing the actual loading of bitmap indexes
and optimizing the Counting Objects phase of `pack-objects`. Like with every other
patch that offers performance improvements, sample benchmarks are provided (spoiler:
they are pretty fucking cool).

Patch 11 and 16 are samples of using the new Bitmap traversal API to speed up other
parts of Git (`rev-list --objects` and `rev-list --count`, respectively).

Patch 12, 13 and 15 implement the actual writing of bitmap indexes. Like JGit, patch
12 enables writing a bitmap index as part of the `pack-objects` process (and hence
as part of a normal `gc` run). On top of that, I implemented a new plumbing command
in patch 15 that allows to write bitmap indexes for already-existing packfiles.

I'd love your feedback on the design and implementation of this feature. I deem it
rather stable, as we've been testing it on production on the world's largest Git
host (Git Hub Dot Com The Web Site) with good results, so I'd love it to have it
upstreamed on Core Git.

Strawberry kisses,
vmg

Jeff King (1):
  list-objects: mark tree as unparsed when we free its buffer

Vicent Marti (15):
  sha1_file: refactor into `find_pack_object_pos`
  pack-objects: use a faster hash table
  pack-objects: make `pack_name_hash` global
  revision: allow setting custom limiter function
  sha1_file: export `git_open_noatime`
  compat: add endinanness helpers
  ewah: compressed bitmap implementation
  documentation: add documentation for the bitmap format
  pack-objects: use bitmaps when packing objects
  rev-list: add bitmap mode to speed up lists
  pack-objects: implement bitmap writing
  repack: consider bitmaps when performing repacks
  sha1_file: implement `nth_packed_object_info`
  write-bitmap: implement new git command to write bitmaps
  rev-list: Optimize --count using bitmaps too

 Documentation/technical/bitmap-format.txt |  235 ++++++++
 Makefile                                  |   11 +
 builtin.h                                 |    1 +
 builtin/pack-objects.c                    |  362 +++++++-----
 builtin/pack-objects.h                    |   33 ++
 builtin/rev-list.c                        |   35 +-
 builtin/write-bitmap.c                    |  256 +++++++++
 cache.h                                   |    5 +
 ewah/bitmap.c                             |  229 ++++++++
 ewah/ewah_bitmap.c                        |  703 ++++++++++++++++++++++++
 ewah/ewah_io.c                            |  199 +++++++
 ewah/ewah_rlw.c                           |  124 +++++
 ewah/ewok.h                               |  194 +++++++
 ewah/ewok_rlw.h                           |  114 ++++
 git-compat-util.h                         |   28 +
 git-repack.sh                             |   10 +-
 git.c                                     |    1 +
 khash.h                                   |  329 +++++++++++
 list-objects.c                            |    1 +
 pack-bitmap-write.c                       |  520 ++++++++++++++++++
 pack-bitmap.c                             |  855 +++++++++++++++++++++++++++++
 pack-bitmap.h                             |   64 +++
 pack-write.c                              |    2 +
 revision.c                                |    5 +
 revision.h                                |    2 +
 sha1_file.c                               |   57 +-
 26 files changed, 4212 insertions(+), 163 deletions(-)
 create mode 100644 Documentation/technical/bitmap-format.txt
 create mode 100644 builtin/pack-objects.h
 create mode 100644 builtin/write-bitmap.c
 create mode 100644 ewah/bitmap.c
 create mode 100644 ewah/ewah_bitmap.c
 create mode 100644 ewah/ewah_io.c
 create mode 100644 ewah/ewah_rlw.c
 create mode 100644 ewah/ewok.h
 create mode 100644 ewah/ewok_rlw.h
 create mode 100644 khash.h
 create mode 100644 pack-bitmap-write.c
 create mode 100644 pack-bitmap.c
 create mode 100644 pack-bitmap.h

-- 
1.7.9.5

             reply	other threads:[~2013-06-24 23:23 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 23:22 Vicent Marti [this message]
2013-06-24 23:22 ` [PATCH 01/16] list-objects: mark tree as unparsed when we free its buffer Vicent Marti
2013-06-24 23:22 ` [PATCH 02/16] sha1_file: refactor into `find_pack_object_pos` Vicent Marti
2013-06-25 13:59   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 03/16] pack-objects: use a faster hash table Vicent Marti
2013-06-25 14:03   ` Thomas Rast
2013-06-26  2:14     ` Jeff King
2013-06-26  4:47       ` Jeff King
2013-06-25 17:58   ` Ramkumar Ramachandra
2013-06-25 22:48   ` Junio C Hamano
2013-06-25 23:09     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 04/16] pack-objects: make `pack_name_hash` global Vicent Marti
2013-06-24 23:23 ` [PATCH 05/16] revision: allow setting custom limiter function Vicent Marti
2013-06-24 23:23 ` [PATCH 06/16] sha1_file: export `git_open_noatime` Vicent Marti
2013-06-24 23:23 ` [PATCH 07/16] compat: add endinanness helpers Vicent Marti
2013-06-25 13:08   ` Peter Krefting
2013-06-25 13:25     ` Vicent Martí
2013-06-27  5:56       ` Peter Krefting
2013-06-24 23:23 ` [PATCH 08/16] ewah: compressed bitmap implementation Vicent Marti
2013-06-25  1:10   ` Junio C Hamano
2013-06-25 22:51     ` Junio C Hamano
2013-06-25 15:38   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 09/16] documentation: add documentation for the bitmap format Vicent Marti
2013-06-25  5:42   ` Shawn Pearce
2013-06-25 19:33     ` Vicent Martí
2013-06-25 21:17       ` Junio C Hamano
2013-06-25 22:08         ` Vicent Martí
2013-06-27  1:11           ` Shawn Pearce
2013-06-27  2:36             ` Vicent Martí
2013-06-27  2:45               ` Jeff King
2013-06-27 16:07                 ` Shawn Pearce
2013-06-27 17:17                   ` Jeff King
2013-07-01 18:47                   ` Colby Ranger
2013-07-01 19:13                     ` Shawn Pearce
2013-07-07  9:46                     ` Jeff King
2013-07-07 17:27                       ` Shawn Pearce
2013-06-26  5:11       ` Jeff King
2013-06-26 18:41         ` Colby Ranger
2013-06-26 22:33           ` Colby Ranger
2013-06-27  0:53             ` Colby Ranger
2013-06-27  1:32               ` Shawn Pearce
2013-06-27  1:29         ` Shawn Pearce
2013-06-25 15:58   ` Thomas Rast
2013-06-25 22:30     ` Vicent Martí
2013-06-26 23:12       ` Thomas Rast
2013-06-26 23:19         ` Thomas Rast
2013-06-24 23:23 ` [PATCH 10/16] pack-objects: use bitmaps when packing objects Vicent Marti
2013-06-25 12:48   ` Ramkumar Ramachandra
2013-06-25 15:58   ` Thomas Rast
2013-06-25 23:06   ` Junio C Hamano
2013-06-25 23:14     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 11/16] rev-list: add bitmap mode to speed up lists Vicent Marti
2013-06-25 16:22   ` Thomas Rast
2013-06-26  1:45     ` Vicent Martí
2013-06-26 23:13       ` Thomas Rast
2013-06-26  5:22     ` Jeff King
2013-06-24 23:23 ` [PATCH 12/16] pack-objects: implement bitmap writing Vicent Marti
2013-06-24 23:23 ` [PATCH 13/16] repack: consider bitmaps when performing repacks Vicent Marti
2013-06-25 23:00   ` Junio C Hamano
2013-06-25 23:16     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 14/16] sha1_file: implement `nth_packed_object_info` Vicent Marti
2013-06-24 23:23 ` [PATCH 15/16] write-bitmap: implement new git command to write bitmaps Vicent Marti
2013-06-24 23:23 ` [PATCH 16/16] rev-list: Optimize --count using bitmaps too Vicent Marti
2013-06-25 16:05 ` [PATCH 00/16] Speed up Counting Objects with bitmap data Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1372116193-32762-1-git-send-email-tanoku@gmail.com \
    --to=tanoku@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).