git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Cc: Abhradeep Chakraborty via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
	Kaartic Sivaram <kaartic.sivaraam@gmail.com>
Subject: Re: [PATCH 3/5] roaring: teach Git to write roaring bitmaps
Date: Sun, 30 Oct 2022 15:46:20 -0400	[thread overview]
Message-ID: <58841dcd-e732-416f-5ab0-fd5a5d8de4c7@github.com> (raw)
In-Reply-To: <CAPOJW5z_ZRChNo8PGBmJu=vvjTL2cYL8oTdVwoDRh-UHt2Dy4w@mail.gmail.com>

On 10/30/2022 2:35 AM, Abhradeep Chakraborty wrote:
> Hello all,
> 
> It has been a month since I didn't get involved in any open source
> contributions (including Git). This is due to the fact that I was
> focusing more on mastering theories and also that it was a festive
> month. So, I am now resuming my work. There are many things I have to
> cover (including this patch series).
> But before that I want to ask you a question - As you have noticed
> already, the Roaring library has a lot of styling issues (Moreover it
> is using C11). So Should I fix all these issues? or Should I make a
> new library (using Git's compatibility library "git-compat-util.h") by
> taking CRoaring as a reference? The pros are that it would be easier
> to format the bitmap library specific files and it can use Git
> compatible functions.
> 
> I would love to hear your opinions. Thanks :)

I HAVE OPINIONS! :D

Mostly, there are two things I'd like for you to keep in mind:

1. Using the library as-is is a great way to prototype and dig in on
   the performance measurement side. Can you construct or clone enough
   interesting repositories to get a feeling of the effect of the
   roaring format compared to the EWAH format? If there is no benefit
   to switching, then we can save everyone a lot of work by marking
   that as an incorrect road. However, if there is sufficient evidence
   that it's working well, then we have established a baseline that
   the full implementation should match (at least, if not do better).

2. Once deciding to do the work, we can think about the reasons to use
   the existing library over writing our own. The most basic reason is
   that the library is extensively tested, so we gain all of those
   benefits. Can we incorporate their test suite into our own? The
   next main benefit is that we can take any changes from their version
   into our code with minimal fuss. How often do you think that they
   have bug fixes or enhancements in the repo? How would those changes
   translate into our mailing list workflow? If we restyled the library,
   then we are unlikely to get easy benefits from taking upstream
   changes, but we could recreate them with manual effort.

3. After carefully considering the benefits/drawbacks of using the
   existing library, consider the same for writing one from scratch.
   The most important thing I will say here is that the core idea is
   rather simple. There may even be ways that we can take advantage
   of the format and its data structures with the expectations we have
   in Git repositories that are not always possible for generic
   databases. We should be able to build a much smaller library that's
   limited to our needs and customized to our use case. However, we
   would need to test it carefully, both for correctness and for
   performance, and that is not a small undertaking.

Hopefully this gives you something to chew on. Investigating each of
these directions should help you come to a conclusion that you can
bring to the community as the expert, then we can examine your
findings to see if we agree.

Remember that code speaks. If you're willing to build it one way,
then that concrete implementation is already worth more than a
hypothetical alternative in many regards. That can be a starting
point to move forward.

Thanks,
-Stolee

  reply	other threads:[~2022-10-30 19:46 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-19 17:47 [PATCH 0/5] [RFC] introduce Roaring bitmaps to Git Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 1/5] reachability-bitmaps: add CRoaring library " Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 2/5] roaring.[ch]: apply Git specific changes to the roaring API Abhradeep Chakraborty via GitGitGadget
2022-09-19 18:33   ` Derrick Stolee
2022-09-19 22:02     ` Junio C Hamano
2022-09-20 12:19       ` Derrick Stolee
2022-09-20 15:09         ` Abhradeep Chakraborty
2022-09-21 16:58         ` Junio C Hamano
2022-09-20 14:46     ` Abhradeep Chakraborty
2022-09-19 17:47 ` [PATCH 3/5] roaring: teach Git to write roaring bitmaps Abhradeep Chakraborty via GitGitGadget
2022-09-30  6:20   ` Junio C Hamano
2022-09-30 16:23     ` Abhradeep Chakraborty
2022-10-30  6:35       ` Abhradeep Chakraborty
2022-10-30 19:46         ` Derrick Stolee [this message]
2022-10-31 14:30           ` Abhradeep Chakraborty
2022-10-31 16:06           ` Junio C Hamano
2022-10-31 17:51             ` C99 -> C11 or C17? (was: [PATCH 3/5] roaring: teach Git to write roaring bitmaps) Ævar Arnfjörð Bjarmason
2022-10-31 20:55               ` rsbecker
2022-11-01  6:58             ` [PATCH 3/5] roaring: teach Git to write roaring bitmaps Abhradeep Chakraborty
2022-09-19 17:47 ` [PATCH 4/5] roaring: introduce a new config option for " Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 5/5] roaring: teach Git to read " Abhradeep Chakraborty via GitGitGadget
2022-09-19 18:18 ` [PATCH 0/5] [RFC] introduce Roaring bitmaps to Git Derrick Stolee
2022-09-20 14:05   ` Abhradeep Chakraborty
2022-09-20 21:59 ` Taylor Blau
2022-09-21 15:27   ` Abhradeep Chakraborty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58841dcd-e732-416f-5ab0-fd5a5d8de4c7@github.com \
    --to=derrickstolee@github.com \
    --cc=chakrabortyabhradeep79@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=kaartic.sivaraam@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).