git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Jeff King <peff@peff.net>
Cc: "Vicent Martí" <tanoku@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Colby Ranger" <cranger@google.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Date: Thu, 27 Jun 2013 09:07:38 -0700	[thread overview]
Message-ID: <CAJo=hJvOq=CATrDeYAwi+jgkPpqjywWhuKeC1TVYeCXr6NVM6w@mail.gmail.com> (raw)
In-Reply-To: <20130627024521.GA6936@sigill.intra.peff.net>

On Wed, Jun 26, 2013 at 7:45 PM, Jeff King <peff@peff.net> wrote:
>
> In particular, it seems like the slowness we saw with the v1 bitmap
> format is not what Shawn and Colby have experienced. So it's possible
> that our test setup is bad or different. Or maybe the C v1 reading
> implementation had some problems that are fixable. It's hard to say
> because we haven't shown any code that can be timed and compared.

Right, the format and implementation in JGit can do "Counting objects"
in 87ms for the Linux kernel history. But I think we are comparing
apples to steaks here, Vincent is (rightfully) concerned about process
startup performance, whereas our timings were assuming the process was
already running.

It would help everyone to understand the issues involved if we are at
least looking at the same file format.

> And the pack-order versus idx-order for the bitmaps is still up in the
> air. Do we have numbers on the on-disk sizes of the resulting EWAHs?

I did not see any presented in this thread, and I am very interested
in this aspect of the series. The path hash cache should be taking
about 9.9M of disk space, but I recall reading the bitmap file is 8M.
I don't understand.

Colby and I were very concerned about the size of the EWAH compressed
bitmaps because we wanted hundreds of them for a large history like
the kernel, and we wanted to minimize the amount of memory consumed by
the bitmap index when loaded into the process.

In the JGit implementation our copy of Linus' tree has 3.1M objects,
an 81.5 MiB idx file, and a 3.8 MiB bitmap file. We were trying to
keep the overhead below 10% of the idx file, and I think we have
succeeded on that. With 3.1M objects the v2 bitmap proposed in this
thread needs at least 11.8M, or 14+% overhead just for the path hash
cache.

The path hash cache may still be required, Colby and I have been
debating the merits of having the data available for delta compression
vs. the increase in memory required to hold it.

> The
> pack-order ones should be more amenable to run-length encoding,
> especially as you get further down into history (the tip ones would
> mostly be 1's, no matter how you order them).

This is also true for the type bitmaps, but especially so in the JGit
file ordering where we always write all trees before any blobs. The
type bitmaps are very compact and basically amount to defining a
single range in the file. This takes only a few words in the EWAH
compressed format.

  reply	other threads:[~2013-06-27 16:08 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 23:22 [PATCH 00/16] Speed up Counting Objects with bitmap data Vicent Marti
2013-06-24 23:22 ` [PATCH 01/16] list-objects: mark tree as unparsed when we free its buffer Vicent Marti
2013-06-24 23:22 ` [PATCH 02/16] sha1_file: refactor into `find_pack_object_pos` Vicent Marti
2013-06-25 13:59   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 03/16] pack-objects: use a faster hash table Vicent Marti
2013-06-25 14:03   ` Thomas Rast
2013-06-26  2:14     ` Jeff King
2013-06-26  4:47       ` Jeff King
2013-06-25 17:58   ` Ramkumar Ramachandra
2013-06-25 22:48   ` Junio C Hamano
2013-06-25 23:09     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 04/16] pack-objects: make `pack_name_hash` global Vicent Marti
2013-06-24 23:23 ` [PATCH 05/16] revision: allow setting custom limiter function Vicent Marti
2013-06-24 23:23 ` [PATCH 06/16] sha1_file: export `git_open_noatime` Vicent Marti
2013-06-24 23:23 ` [PATCH 07/16] compat: add endinanness helpers Vicent Marti
2013-06-25 13:08   ` Peter Krefting
2013-06-25 13:25     ` Vicent Martí
2013-06-27  5:56       ` Peter Krefting
2013-06-24 23:23 ` [PATCH 08/16] ewah: compressed bitmap implementation Vicent Marti
2013-06-25  1:10   ` Junio C Hamano
2013-06-25 22:51     ` Junio C Hamano
2013-06-25 15:38   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 09/16] documentation: add documentation for the bitmap format Vicent Marti
2013-06-25  5:42   ` Shawn Pearce
2013-06-25 19:33     ` Vicent Martí
2013-06-25 21:17       ` Junio C Hamano
2013-06-25 22:08         ` Vicent Martí
2013-06-27  1:11           ` Shawn Pearce
2013-06-27  2:36             ` Vicent Martí
2013-06-27  2:45               ` Jeff King
2013-06-27 16:07                 ` Shawn Pearce [this message]
2013-06-27 17:17                   ` Jeff King
2013-07-01 18:47                   ` Colby Ranger
2013-07-01 19:13                     ` Shawn Pearce
2013-07-07  9:46                     ` Jeff King
2013-07-07 17:27                       ` Shawn Pearce
2013-06-26  5:11       ` Jeff King
2013-06-26 18:41         ` Colby Ranger
2013-06-26 22:33           ` Colby Ranger
2013-06-27  0:53             ` Colby Ranger
2013-06-27  1:32               ` Shawn Pearce
2013-06-27  1:29         ` Shawn Pearce
2013-06-25 15:58   ` Thomas Rast
2013-06-25 22:30     ` Vicent Martí
2013-06-26 23:12       ` Thomas Rast
2013-06-26 23:19         ` Thomas Rast
2013-06-24 23:23 ` [PATCH 10/16] pack-objects: use bitmaps when packing objects Vicent Marti
2013-06-25 12:48   ` Ramkumar Ramachandra
2013-06-25 15:58   ` Thomas Rast
2013-06-25 23:06   ` Junio C Hamano
2013-06-25 23:14     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 11/16] rev-list: add bitmap mode to speed up lists Vicent Marti
2013-06-25 16:22   ` Thomas Rast
2013-06-26  1:45     ` Vicent Martí
2013-06-26 23:13       ` Thomas Rast
2013-06-26  5:22     ` Jeff King
2013-06-24 23:23 ` [PATCH 12/16] pack-objects: implement bitmap writing Vicent Marti
2013-06-24 23:23 ` [PATCH 13/16] repack: consider bitmaps when performing repacks Vicent Marti
2013-06-25 23:00   ` Junio C Hamano
2013-06-25 23:16     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 14/16] sha1_file: implement `nth_packed_object_info` Vicent Marti
2013-06-24 23:23 ` [PATCH 15/16] write-bitmap: implement new git command to write bitmaps Vicent Marti
2013-06-24 23:23 ` [PATCH 16/16] rev-list: Optimize --count using bitmaps too Vicent Marti
2013-06-25 16:05 ` [PATCH 00/16] Speed up Counting Objects with bitmap data Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJo=hJvOq=CATrDeYAwi+jgkPpqjywWhuKeC1TVYeCXr6NVM6w@mail.gmail.com' \
    --to=spearce@spearce.org \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=tanoku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).