git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Han-Wen Nienhuys <hanwen@google.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>,
	git@vger.kernel.org
Subject: Re: What is the status of GSoC 2022 work on making Git use roaring bitmaps?
Date: Tue, 1 Aug 2023 13:54:11 +0200	[thread overview]
Message-ID: <CAFQ2z_MmUDMTc7wyR1X8oxXdtz54_0HZmS2Q8iv9YMoqZmh0hQ@mail.gmail.com> (raw)
In-Reply-To: <CANQwDwe8Po-2KxNjWQ+RW+hYGLF=4sYTjQJxcVSAOtbtpfVRhQ@mail.gmail.com>

On Tue, Aug 1, 2023 at 1:35 PM Jakub Narębski <jnareb@gmail.com> wrote:
>
> Hello,
>
> On Tue, 1 Aug 2023 at 13:26, Han-Wen Nienhuys <hanwen@google.com> wrote:
> > On Mon, Jul 31, 2023 at 10:18 PM Taylor Blau <me@ttaylorr.com> wrote:
> > >
> > > I haven't proved conclusively one way or the other where Roaring+Run is
> > > significantly faster than EWAH or vice-versa. There are some cases where
> > > the former is a clear winner, and other cases where it's the latter.
> > >
> > > In any event, my extremely WIP patches to make this mostly work are
> > > available here:
> > >
> > >   https://github.com/ttaylorr/git/compare/tb/roaring-bitmaps
> > >
> >
> > thanks. For anyone reading along, the changes to JGit are here
> >
> > https://git.eclipse.org/r/c/jgit/jgit/+/203448
> >
> > I was looking into this because I was hoping that roaring might
> > decrease peak memory usage.
> >
> > I don't have firm evidence that it's better or worse, but I did
> > observe that runtime and memory usage during GC (which is heavy on
> > bitmap operations due to delta/xor encoding) was unchanged. That makes
> > me pessimistic that there are significant gains to be had.
>
> The major advantage Roaring bitmaps have over EWAH and other
> simple Run Length Encoding based compression algorithms is that
> bitmap operations can be done on compressed bitmaps: there is no
> need to uncompress bitmap to do (want1 OR want2 AND NOT have).

Are you sure? The source code for and and andNot look rather similar
in that they seem to do operations on whole RLE sections at a time,

https://sourcegraph.com/github.com/lemire/javaewah/-/blob/src/main/java/com/googlecode/javaewah/EWAHCompressedBitmap.java?L498

https://sourcegraph.com/github.com/lemire/javaewah/-/blob/src/main/java/com/googlecode/javaewah/EWAHCompressedBitmap.java?L405

Looking at the EWAH format as documented for git-bitmap-format, EWAH
allows for RLE on both 1s and 0s. It should be possible to efficiently
clear out a section of the target if the second operand of andNot has
RLE encoded run of 1s.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Liana Sebastian

  reply	other threads:[~2023-08-01 11:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23 19:26 What is the status of GSoC 2022 work on making Git use roaring bitmaps? Jakub Narębski
2023-03-23 20:26 ` Taylor Blau
2023-03-23 22:01   ` Jakub Narębski
2023-03-24  3:48     ` Abhradeep Chakraborty
2023-03-25 17:40       ` Jakub Narębski
2023-07-31 17:46         ` Han-Wen Nienhuys
2023-07-31 20:18           ` Taylor Blau
2023-08-01 11:26             ` Han-Wen Nienhuys
2023-08-01 11:34               ` Jakub Narębski
2023-08-01 11:54                 ` Han-Wen Nienhuys [this message]
2023-08-01 13:17                   ` Jakub Narębski
2023-08-01 17:33                 ` Taylor Blau
2023-08-01 17:43                   ` Jakub Narębski
2023-08-01 17:31               ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFQ2z_MmUDMTc7wyR1X8oxXdtz54_0HZmS2Q8iv9YMoqZmh0hQ@mail.gmail.com \
    --to=hanwen@google.com \
    --cc=chakrabortyabhradeep79@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).