git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Vicent Martí" <tanoku@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>,
	Colby Ranger <cranger@google.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Date: Tue, 25 Jun 2013 14:17:20 -0700	[thread overview]
Message-ID: <7vtxkl28m7.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <CAFFjANRwBBcORhu4mwjESBfr4GJ3zDrgYvUhY=VxK9abv7k2MA@mail.gmail.com> ("Vicent Martí"'s message of "Tue, 25 Jun 2013 21:33:11 +0200")

Vicent Martí <tanoku@gmail.com> writes:

>>> +               There is a bitmap for each Git object type, stored in the following
>>> +               order:
>>> +
>>> +                       - Commits
>>> +                       - Trees
>>> +                       - Blobs
>>> +                       - Tags
>>> +
>>> +               In each bitmap, the `n`th bit is set to true if the `n`th object
>>> +               in the packfile index is of that type.
>>> +
>>> +               The obvious consequence is that the XOR of all 4 bitmaps will result
>>> +               in a full set (all bits sets), and the AND of all 4 bitmaps will
>>> +               result in an empty bitmap (no bits set).
>>
>> Instead of XOR did you mean OR here?
>
> Nope, I think XOR makes it more obvious: if the same bit is set on two
> bitmaps, it would be cleared when XORed together, and hence all the
> bits wouldn't be set. An OR would hide this case.

What case are you talking about?

The n-th object must be one of these four types and can never be of
more than one type at the same time, so a natural expectation from
the reader is "If you OR them together, you will get the same set".
If you say "If you XOR them", that forces the reader to wonder when
these bitmaps ever can overlap at the same bit position.

> To sum it up: I'd like to see this format be strictly in Network Byte
> Order,

Good.

I've been wondering what you meant by "cannot be mmap-ed" from the
very beginning.  We mmapped the index for a long time, and it is
defined in terms of network byte order.  Of course, pack .idx files
are in network byte order, too, and we mmap them without problems.
It seems that it primarily came from your fear that using network
byte order may be unnecessarily hard to perform well, and it would
be a good thing to do to try to do so first instead of punting from
the beginning.

> and I'm going to try to make it run fast enough in that
> encoding.

Hmph.  Is it an option to start from what JGit does, so that people
can use both JGit and your code on the same repository?  And then if
you do not succeed, after trying to optimize in-core processing
using that on-disk format to make it fast enough, start thinking
about tweaking the on-disk format?

Thanks.

  reply	other threads:[~2013-06-25 21:17 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 23:22 [PATCH 00/16] Speed up Counting Objects with bitmap data Vicent Marti
2013-06-24 23:22 ` [PATCH 01/16] list-objects: mark tree as unparsed when we free its buffer Vicent Marti
2013-06-24 23:22 ` [PATCH 02/16] sha1_file: refactor into `find_pack_object_pos` Vicent Marti
2013-06-25 13:59   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 03/16] pack-objects: use a faster hash table Vicent Marti
2013-06-25 14:03   ` Thomas Rast
2013-06-26  2:14     ` Jeff King
2013-06-26  4:47       ` Jeff King
2013-06-25 17:58   ` Ramkumar Ramachandra
2013-06-25 22:48   ` Junio C Hamano
2013-06-25 23:09     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 04/16] pack-objects: make `pack_name_hash` global Vicent Marti
2013-06-24 23:23 ` [PATCH 05/16] revision: allow setting custom limiter function Vicent Marti
2013-06-24 23:23 ` [PATCH 06/16] sha1_file: export `git_open_noatime` Vicent Marti
2013-06-24 23:23 ` [PATCH 07/16] compat: add endinanness helpers Vicent Marti
2013-06-25 13:08   ` Peter Krefting
2013-06-25 13:25     ` Vicent Martí
2013-06-27  5:56       ` Peter Krefting
2013-06-24 23:23 ` [PATCH 08/16] ewah: compressed bitmap implementation Vicent Marti
2013-06-25  1:10   ` Junio C Hamano
2013-06-25 22:51     ` Junio C Hamano
2013-06-25 15:38   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 09/16] documentation: add documentation for the bitmap format Vicent Marti
2013-06-25  5:42   ` Shawn Pearce
2013-06-25 19:33     ` Vicent Martí
2013-06-25 21:17       ` Junio C Hamano [this message]
2013-06-25 22:08         ` Vicent Martí
2013-06-27  1:11           ` Shawn Pearce
2013-06-27  2:36             ` Vicent Martí
2013-06-27  2:45               ` Jeff King
2013-06-27 16:07                 ` Shawn Pearce
2013-06-27 17:17                   ` Jeff King
2013-07-01 18:47                   ` Colby Ranger
2013-07-01 19:13                     ` Shawn Pearce
2013-07-07  9:46                     ` Jeff King
2013-07-07 17:27                       ` Shawn Pearce
2013-06-26  5:11       ` Jeff King
2013-06-26 18:41         ` Colby Ranger
2013-06-26 22:33           ` Colby Ranger
2013-06-27  0:53             ` Colby Ranger
2013-06-27  1:32               ` Shawn Pearce
2013-06-27  1:29         ` Shawn Pearce
2013-06-25 15:58   ` Thomas Rast
2013-06-25 22:30     ` Vicent Martí
2013-06-26 23:12       ` Thomas Rast
2013-06-26 23:19         ` Thomas Rast
2013-06-24 23:23 ` [PATCH 10/16] pack-objects: use bitmaps when packing objects Vicent Marti
2013-06-25 12:48   ` Ramkumar Ramachandra
2013-06-25 15:58   ` Thomas Rast
2013-06-25 23:06   ` Junio C Hamano
2013-06-25 23:14     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 11/16] rev-list: add bitmap mode to speed up lists Vicent Marti
2013-06-25 16:22   ` Thomas Rast
2013-06-26  1:45     ` Vicent Martí
2013-06-26 23:13       ` Thomas Rast
2013-06-26  5:22     ` Jeff King
2013-06-24 23:23 ` [PATCH 12/16] pack-objects: implement bitmap writing Vicent Marti
2013-06-24 23:23 ` [PATCH 13/16] repack: consider bitmaps when performing repacks Vicent Marti
2013-06-25 23:00   ` Junio C Hamano
2013-06-25 23:16     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 14/16] sha1_file: implement `nth_packed_object_info` Vicent Marti
2013-06-24 23:23 ` [PATCH 15/16] write-bitmap: implement new git command to write bitmaps Vicent Marti
2013-06-24 23:23 ` [PATCH 16/16] rev-list: Optimize --count using bitmaps too Vicent Marti
2013-06-25 16:05 ` [PATCH 00/16] Speed up Counting Objects with bitmap data Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vtxkl28m7.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    --cc=tanoku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).