git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Vicent Martí" <tanoku@gmail.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Colby Ranger <cranger@google.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Date: Thu, 27 Jun 2013 04:36:54 +0200	[thread overview]
Message-ID: <CAFFjANSr2QRLE8DSPP2zZ_baEZUqR8dzkPzMwqyEqgFX=8cnog@mail.gmail.com> (raw)
In-Reply-To: <CAJo=hJtJoizQUubriTPvs2bsjvw+N82MCPvw263fUB8vv8_VVA@mail.gmail.com>

That was a very rude reply. :(

Please refrain from interacting with me in the ML in the future. I'l
do accordingly.

Thanks!
vmg

On Thu, Jun 27, 2013 at 3:11 AM, Shawn Pearce <spearce@spearce.org> wrote:
> On Tue, Jun 25, 2013 at 4:08 PM, Vicent Martí <tanoku@gmail.com> wrote:
>> On Tue, Jun 25, 2013 at 11:17 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> What case are you talking about?
>>>
>>> The n-th object must be one of these four types and can never be of
>>> more than one type at the same time, so a natural expectation from
>>> the reader is "If you OR them together, you will get the same set".
>>> If you say "If you XOR them", that forces the reader to wonder when
>>> these bitmaps ever can overlap at the same bit position.
>>
>> I guess this is just wording. I don't particularly care about the
>> distinction, but I'll change it to OR.
>
> Hmm, OK. If you think XOR and OR are the same operation, I also have a
> bridge to sell you. Its in Brooklyn. Its a great value.
>
> The correct operation is OR. Not XOR. OR. Drop the X.
>
>> It cannot be mmapped not particularly because of endianness issues,
>> but because the original format is not indexed and requires a full
>> parse of the whole index before it can be accessed programatically.
>> The wrong endianness just increases the parse time.
>
> Wrong endianness has nothing to do with the parse time. Modern CPUs
> can flip a word around very quickly. In JGit we chose to parse the
> file at load time because its simpler than having an additional index
> segment, and we do what you did which is to toss the object SHA-1s
> into a hashtable for fast lookup. By the time we look for the SHA-1s
> and toss them into a hashtable we can stride through the file and find
> the bitmap regions. Simple.
>
> In other words, the least complex solution possible that still
> provides good performance. I'd say we have pretty good performance.
>
>>>> and I'm going to try to make it run fast enough in that
>>>> encoding.
>>>
>>> Hmph.  Is it an option to start from what JGit does, so that people
>>> can use both JGit and your code on the same repository?
>
> I'm afraid I agree here with Junio. The JGit format is already
> shipping in JGit 3.0, Gerrit Code Review 2.6, and in heavy production
> use for almost a year on android.googlesource.com, and Google's own
> internal Git trees.
>
> I would prefer to see a series adding bitmap support to C Git start
> with the existing format, make it run, taking advantage of the
> optimizations JGit uses (many of which you ignored and tried to "fix"
> in other ways), and then look at improving the file format itself if
> load time is still the largest low hanging fruit in upload-pack. I'm
> guessing its not. You futzed around with the object table, but JGit
> sped itself up considerably by simply not using the object table when
> the bitmap is used. I think there are several such optimizations you
> missed in your rush to redefine the file format.
>
>>>  And then if
>>> you do not succeed, after trying to optimize in-core processing
>>> using that on-disk format to make it fast enough, start thinking
>>> about tweaking the on-disk format?
>>
>> I'm afraid this is not an option. I have an old patchset that
>> implements JGit v1 bitmap loading (and in fact that's how I initially
>> developed these series -- by loading the bitmaps from JGit for
>> debugging), but I discarded it because it simply doesn't pan out in
>> production. ~3 seconds time to spawn `upload-pack` is not an option
>> for us. I did not develop a tweaked on-disk format out of boredom.
>
> I think your code or experiments are bogus. Even on our systems with
> JGit a cold start for the Linux kernel doesn't take 3s. And this is
> JGit where Java is slow because "Jesus it has a lot of factories", and
> without mmap'ing the file into the server's address space. Hell the
> file has to come over the network from a remote disk array.
>
>> I could dig up the patch if you're particularly interested in
>> backwards compatibility, but since it was several times slower than
>> the current iteration, I have no interest (time, actually) to maintain
>> it, brush it up, and so on. I have already offered myself to port the
>> v2 format to JGit as soon as it's settled. It sounds like a better
>> investment of all our times.
>
> Actually, I think the format you propose here is inferior to the JGit
> format. In particular the idx-ordering means the EWAH code is useless.
> You might as well not use the EWAH format and just store 2.6M bits per
> commit. The idx-ordering also makes *much* harder to emit a pack file
> a reasonable order for the client. Colby and I tried idx-ordering and
> discarded it when it didn't perform as well as the pack-ordering that
> JGit uses.
>
>> Following up on Shawn's comments, I removed the little-endian support
>> from the on-disk format and implemented lazy loading of the bitmaps to
>> make up for it. The result is decent (slowed down from 250ms to 300ms)
>> and it lets us keep the whole format as NWO on disk. I think it's a
>> good tradeback.
>
> The maintenance burden of two endian formats in a single file is too
> high to justify. I'm glad to see you saw that.
>
>> As it stands right now, the only two changes from v1 of the on-disk format are:
>>
>> - There is an index at the end. This is a good idea.
>
> I don't think the index is necessary if you plan to build a hashtable
> at runtime anyway. If you mmap the file you can quickly skip over a
> bitmap and find the next SHA-1 using this thing called "pointer
> arithmetic". I am not sure if you are familiar with the term, perhaps
> you could search the web for it.
>
>> - The bitmaps are sorted in packfile-index order, not in packfile
>> order. This is a good idea.
>
> As Colby and I have repeatedly tried to explain, this is not a good idea.
>
>> German kisses,
>
> Strawberry and now German kisses? What's next, Mango kisses?

  reply	other threads:[~2013-06-27  2:37 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 23:22 [PATCH 00/16] Speed up Counting Objects with bitmap data Vicent Marti
2013-06-24 23:22 ` [PATCH 01/16] list-objects: mark tree as unparsed when we free its buffer Vicent Marti
2013-06-24 23:22 ` [PATCH 02/16] sha1_file: refactor into `find_pack_object_pos` Vicent Marti
2013-06-25 13:59   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 03/16] pack-objects: use a faster hash table Vicent Marti
2013-06-25 14:03   ` Thomas Rast
2013-06-26  2:14     ` Jeff King
2013-06-26  4:47       ` Jeff King
2013-06-25 17:58   ` Ramkumar Ramachandra
2013-06-25 22:48   ` Junio C Hamano
2013-06-25 23:09     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 04/16] pack-objects: make `pack_name_hash` global Vicent Marti
2013-06-24 23:23 ` [PATCH 05/16] revision: allow setting custom limiter function Vicent Marti
2013-06-24 23:23 ` [PATCH 06/16] sha1_file: export `git_open_noatime` Vicent Marti
2013-06-24 23:23 ` [PATCH 07/16] compat: add endinanness helpers Vicent Marti
2013-06-25 13:08   ` Peter Krefting
2013-06-25 13:25     ` Vicent Martí
2013-06-27  5:56       ` Peter Krefting
2013-06-24 23:23 ` [PATCH 08/16] ewah: compressed bitmap implementation Vicent Marti
2013-06-25  1:10   ` Junio C Hamano
2013-06-25 22:51     ` Junio C Hamano
2013-06-25 15:38   ` Thomas Rast
2013-06-24 23:23 ` [PATCH 09/16] documentation: add documentation for the bitmap format Vicent Marti
2013-06-25  5:42   ` Shawn Pearce
2013-06-25 19:33     ` Vicent Martí
2013-06-25 21:17       ` Junio C Hamano
2013-06-25 22:08         ` Vicent Martí
2013-06-27  1:11           ` Shawn Pearce
2013-06-27  2:36             ` Vicent Martí [this message]
2013-06-27  2:45               ` Jeff King
2013-06-27 16:07                 ` Shawn Pearce
2013-06-27 17:17                   ` Jeff King
2013-07-01 18:47                   ` Colby Ranger
2013-07-01 19:13                     ` Shawn Pearce
2013-07-07  9:46                     ` Jeff King
2013-07-07 17:27                       ` Shawn Pearce
2013-06-26  5:11       ` Jeff King
2013-06-26 18:41         ` Colby Ranger
2013-06-26 22:33           ` Colby Ranger
2013-06-27  0:53             ` Colby Ranger
2013-06-27  1:32               ` Shawn Pearce
2013-06-27  1:29         ` Shawn Pearce
2013-06-25 15:58   ` Thomas Rast
2013-06-25 22:30     ` Vicent Martí
2013-06-26 23:12       ` Thomas Rast
2013-06-26 23:19         ` Thomas Rast
2013-06-24 23:23 ` [PATCH 10/16] pack-objects: use bitmaps when packing objects Vicent Marti
2013-06-25 12:48   ` Ramkumar Ramachandra
2013-06-25 15:58   ` Thomas Rast
2013-06-25 23:06   ` Junio C Hamano
2013-06-25 23:14     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 11/16] rev-list: add bitmap mode to speed up lists Vicent Marti
2013-06-25 16:22   ` Thomas Rast
2013-06-26  1:45     ` Vicent Martí
2013-06-26 23:13       ` Thomas Rast
2013-06-26  5:22     ` Jeff King
2013-06-24 23:23 ` [PATCH 12/16] pack-objects: implement bitmap writing Vicent Marti
2013-06-24 23:23 ` [PATCH 13/16] repack: consider bitmaps when performing repacks Vicent Marti
2013-06-25 23:00   ` Junio C Hamano
2013-06-25 23:16     ` Vicent Martí
2013-06-24 23:23 ` [PATCH 14/16] sha1_file: implement `nth_packed_object_info` Vicent Marti
2013-06-24 23:23 ` [PATCH 15/16] write-bitmap: implement new git command to write bitmaps Vicent Marti
2013-06-24 23:23 ` [PATCH 16/16] rev-list: Optimize --count using bitmaps too Vicent Marti
2013-06-25 16:05 ` [PATCH 00/16] Speed up Counting Objects with bitmap data Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFFjANSr2QRLE8DSPP2zZ_baEZUqR8dzkPzMwqyEqgFX=8cnog@mail.gmail.com' \
    --to=tanoku@gmail.com \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).