From: Derrick Stolee <derrickstolee@github.com>
To: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Cc: Abhradeep Chakraborty via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
Kaartic Sivaram <kaartic.sivaraam@gmail.com>
Subject: Re: [PATCH 3/5] roaring: teach Git to write roaring bitmaps
Date: Sun, 30 Oct 2022 15:46:20 -0400 [thread overview]
Message-ID: <58841dcd-e732-416f-5ab0-fd5a5d8de4c7@github.com> (raw)
In-Reply-To: <CAPOJW5z_ZRChNo8PGBmJu=vvjTL2cYL8oTdVwoDRh-UHt2Dy4w@mail.gmail.com>
On 10/30/2022 2:35 AM, Abhradeep Chakraborty wrote:
> Hello all,
>
> It has been a month since I didn't get involved in any open source
> contributions (including Git). This is due to the fact that I was
> focusing more on mastering theories and also that it was a festive
> month. So, I am now resuming my work. There are many things I have to
> cover (including this patch series).
> But before that I want to ask you a question - As you have noticed
> already, the Roaring library has a lot of styling issues (Moreover it
> is using C11). So Should I fix all these issues? or Should I make a
> new library (using Git's compatibility library "git-compat-util.h") by
> taking CRoaring as a reference? The pros are that it would be easier
> to format the bitmap library specific files and it can use Git
> compatible functions.
>
> I would love to hear your opinions. Thanks :)
I HAVE OPINIONS! :D
Mostly, there are two things I'd like for you to keep in mind:
1. Using the library as-is is a great way to prototype and dig in on
the performance measurement side. Can you construct or clone enough
interesting repositories to get a feeling of the effect of the
roaring format compared to the EWAH format? If there is no benefit
to switching, then we can save everyone a lot of work by marking
that as an incorrect road. However, if there is sufficient evidence
that it's working well, then we have established a baseline that
the full implementation should match (at least, if not do better).
2. Once deciding to do the work, we can think about the reasons to use
the existing library over writing our own. The most basic reason is
that the library is extensively tested, so we gain all of those
benefits. Can we incorporate their test suite into our own? The
next main benefit is that we can take any changes from their version
into our code with minimal fuss. How often do you think that they
have bug fixes or enhancements in the repo? How would those changes
translate into our mailing list workflow? If we restyled the library,
then we are unlikely to get easy benefits from taking upstream
changes, but we could recreate them with manual effort.
3. After carefully considering the benefits/drawbacks of using the
existing library, consider the same for writing one from scratch.
The most important thing I will say here is that the core idea is
rather simple. There may even be ways that we can take advantage
of the format and its data structures with the expectations we have
in Git repositories that are not always possible for generic
databases. We should be able to build a much smaller library that's
limited to our needs and customized to our use case. However, we
would need to test it carefully, both for correctness and for
performance, and that is not a small undertaking.
Hopefully this gives you something to chew on. Investigating each of
these directions should help you come to a conclusion that you can
bring to the community as the expert, then we can examine your
findings to see if we agree.
Remember that code speaks. If you're willing to build it one way,
then that concrete implementation is already worth more than a
hypothetical alternative in many regards. That can be a starting
point to move forward.
Thanks,
-Stolee
next prev parent reply other threads:[~2022-10-30 19:46 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-19 17:47 [PATCH 0/5] [RFC] introduce Roaring bitmaps to Git Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 1/5] reachability-bitmaps: add CRoaring library " Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 2/5] roaring.[ch]: apply Git specific changes to the roaring API Abhradeep Chakraborty via GitGitGadget
2022-09-19 18:33 ` Derrick Stolee
2022-09-19 22:02 ` Junio C Hamano
2022-09-20 12:19 ` Derrick Stolee
2022-09-20 15:09 ` Abhradeep Chakraborty
2022-09-21 16:58 ` Junio C Hamano
2022-09-20 14:46 ` Abhradeep Chakraborty
2022-09-19 17:47 ` [PATCH 3/5] roaring: teach Git to write roaring bitmaps Abhradeep Chakraborty via GitGitGadget
2022-09-30 6:20 ` Junio C Hamano
2022-09-30 16:23 ` Abhradeep Chakraborty
2022-10-30 6:35 ` Abhradeep Chakraborty
2022-10-30 19:46 ` Derrick Stolee [this message]
2022-10-31 14:30 ` Abhradeep Chakraborty
2022-10-31 16:06 ` Junio C Hamano
2022-10-31 17:51 ` C99 -> C11 or C17? (was: [PATCH 3/5] roaring: teach Git to write roaring bitmaps) Ævar Arnfjörð Bjarmason
2022-10-31 20:55 ` rsbecker
2022-11-01 6:58 ` [PATCH 3/5] roaring: teach Git to write roaring bitmaps Abhradeep Chakraborty
2022-09-19 17:47 ` [PATCH 4/5] roaring: introduce a new config option for " Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 5/5] roaring: teach Git to read " Abhradeep Chakraborty via GitGitGadget
2022-09-19 18:18 ` [PATCH 0/5] [RFC] introduce Roaring bitmaps to Git Derrick Stolee
2022-09-20 14:05 ` Abhradeep Chakraborty
2022-09-20 21:59 ` Taylor Blau
2022-09-21 15:27 ` Abhradeep Chakraborty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58841dcd-e732-416f-5ab0-fd5a5d8de4c7@github.com \
--to=derrickstolee@github.com \
--cc=chakrabortyabhradeep79@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=kaartic.sivaraam@gmail.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).