git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Emily Shaffer <nasamuffin@google.com>
Cc: Git List <git@vger.kernel.org>, Jonathan Nieder <jrn@google.com>,
	Jose Lopes <jabolopes@google.com>,
	Aleksandr Mikhailov <avmikhailov@google.com>
Subject: Re: Proposal/Discussion: Turning parts of Git into libraries
Date: Fri, 17 Feb 2023 20:05:00 -0800	[thread overview]
Message-ID: <CABPp-BE6EA+vXLXJtn8CHO9pHJgLH_uh7_t7AYBRN2gAAA5C+Q@mail.gmail.com> (raw)
In-Reply-To: <CAJoAoZ=Cig_kLocxKGax31sU7Xe4==BGzC__Bg2_pr7krNq6MA@mail.gmail.com>

On Fri, Feb 17, 2023 at 1:45 PM Emily Shaffer <nasamuffin@google.com> wrote:
>
[...]
> This turned out pretty long-winded, so a quick summary before I dive in:
>
> - We want to compile parts of Git as independent libraries
> - We want to do it by making incremental code quality improvements to Git
> - Let's avoid promising stability of the interfaces of those libraries
> - We think it'll let Git do cool stuff like unit tests and allowing
> purpose-built plugins
> - Hopefully by example we can convince the rest of the project to join
> in the effort

Seems like quite reasonable high-level goals.

[...]
> The good news is that for practical near-term purposes, "libification"
> mostly means cleanups to the Git codebase, and continuing code health
> work that the project has already cared about doing:
>
> - Removing references to global variables and instead piping them
> through arguments
> - Finding and fixing memory leaks, especially in widely-used low-level code

Does removing memory leaks also mean converting UNLEAK to free()?
Thinking of things in a library context probably pushes us in that
direction (though, alternatively, it might just highlight the question
of what is considered "low-level" instead).

> - Reducing or removing `die()` invocations in low-level code, and
> instead reporting errors back to callers in a consistent way

What delinates "low-level" code?  (A "we don't know yet but we'll
start with obvious places and plan to have good discussions on the
appropriate boundary in the future as we submit patches" is a fine
answer, I'm just curious if you already have a rough idea of where you
intend that boundary to lie.)

> - Clarifying the scope and layering of existing modules, for example
> by moving single-use helpers from the shared module's scope into the
> single user's scope
> - Making module interfaces more consistent and easier to understand,
> including moving "private" functions out of headers and into source
> files and improving in-header documentation

I think these are very positive directions.  I like the fact that your
initial plan benefits all of us, whether or not libification is
ultimately achieved.

[...]
> So what's next? Naturally, I'm looking forward to a spirited
> discussion about this topic - I'd like to know which concerns haven't
> been addressed and figure out whether we can find a way around them,
> and generally build awareness of this effort with the community.

I'm curious whether clarifying scope/layering and cleaning up
interfaces might mean you'd be interested in things like:
  * https://github.com/newren/git/commits/header-cleanups (which was
still WIP; I paused working on it because I figured people would see
it as big "cleanup" patches with no practical benefit)
  * https://github.com/gitgitgadget/git/pull/1149 (which has been
ready to submit for a _long_ time, but I just haven't yet)
or if these two things are orthogonal to what you have in mind.

> I'm also planning to send a proposal for a document full of "best
> practices" for turning Git code into libraries (and have quite a lot
> of discussion around that document, too). My hope is that we can use
> that document to help us during implementation as well as during
> review, and refine it over time as we learn more about what works and
> what doesn't. Having this kind of documentation will make it easy for
> others to join us in moving Git's codebase towards a clean set of
> libraries. I hope that, as a project, we can settle on some tenets
> that we all agree would make Git nicer.

I like the sound of this.

> After that, we're still hoping to target low-level libraries first - I
> certainly don't think it will make sense to ship a high-level `git
> commit` library in the near future, if ever - in the order that
> they're required from the VFS project we're working closely with. As
> far as I can tell right now, that's likely to cover object store and
> worktree access, as well as commit creation and pushing, but we'll see
> how planning shakes out over the next month or so. But Google's
> schedule should have no bearing on what others in the Git project feel
> is important to clean up and libify, and if there is interest in the
> rest of the project in converting other existing modules into
> libraries, my team and I are excited to participate in the review.

If we can't libify something like commit, does that prevent libifying
higher level things like merge?

I spent some time thinking about this a while back.  I tried to
carefully design merge-ort to improve the odds it could be used
elsewhere, maybe even libgit2.  (I hope it shows in the many comments
in merge-ort.h, and I think the "priv" field in particular allowing me
to hide the first ~300 lines of merge-ort.c declaring data structures
from users was really nice.)  However, I still had to accept data in
some known format.  So input parameters are things like trees and
commits.  But tree.h and commit.h both include object.h first, which
includes cache.h, which is basically all of Git.  And the functions I
call to interoperate with the system are similarly entangled.  So, the
odds of merge-ort being reused by libgit2 or otherwise used in a
library seems essentially nil, at least without some broader
libification effort.

I'd like to make that story better, time permitting (which is much
more of a challenge these days than it was a couple years ago), but
I'm curious if you or others have thoughts on something like that.

> Much, much later on, I'm expecting us to form a plan around allowing
> "plugins" - that is, replacing library functionality we use today with
> an alternative library, such as an object store relying on a
> distributed file store like S3. Making that work well will also likely
> involve us coming up with a solution for dependency injection, and to
> begin using vtables for some libraries. I'm hoping that we can figure
> out a way to do that that won't make the Git source ugly. Around this
> time, I think it will make sense to buy into unit tests even more and
> start using an approach like mocking to test various edge cases. And
> at some point, it's likely that we'll want to make the interfaces to
> various Git libraries consistent with each other, which would involve
> some large-scale but hopefully-mechanical refactors.

Would these plugins resemble the pluggable merge backends that was
added to builtin/merge.c?  Would it replace that mechanism with a
different one?  Would it be more like the refs backends?

Would this plugin scheme allow us to, for example, use gitoxide[1] as
a clone replacement to make clones 2x as fast (and with half the
memory -- although I suspect they cheated and used sha1 instead of
sha1dc, so maybe it wouldn't really be 2x)?

Oh, and it's totally okay if you don't know the answers to any or all
of my questions right now.  I'm just curious, because I've long
thought these kinds of directions would be good.  Since I've spent
time thinking about it, I have questions that I don't know the answers
to, but I figured it couldn't hurt to bounce them off others who are
thinking about this area.

Anyway, it's a large pile of work that you're undertaking, and as
Junio comments elsewhere in this thread it's unclear if libification
can be achieved for a big enough component (and you seem to admit as
much in your email as well), but I applaud the general direction and
your initial plans.


[1] https://github.com/Byron/gitoxide/discussions/579

  parent reply	other threads:[~2023-02-18  4:15 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17 21:12 Proposal/Discussion: Turning parts of Git into libraries Emily Shaffer
2023-02-17 21:21 ` brian m. carlson
2023-02-17 21:38   ` Emily Shaffer
2023-02-17 22:41     ` brian m. carlson
2023-02-17 22:49       ` Emily Shaffer
2023-02-22 19:34         ` Jeff King
2023-02-24 20:31           ` Emily Shaffer
2023-02-24 21:41             ` Jeff King
2023-02-24 22:59             ` Junio C Hamano
2023-02-17 22:04   ` rsbecker
2023-02-17 22:48     ` brian m. carlson
2023-02-17 22:57 ` Junio C Hamano
2023-02-18  1:59   ` demerphq
2023-02-18 10:36     ` Phillip Wood
2023-03-23 23:22       ` Felipe Contreras
2023-03-23 23:30         ` rsbecker
2023-03-23 23:34           ` Felipe Contreras
2023-03-23 23:42             ` rsbecker
2023-03-23 23:55               ` Felipe Contreras
2023-03-24 19:27                 ` rsbecker
2023-03-24 21:21                   ` Felipe Contreras
2023-03-24 22:06                     ` rsbecker
2023-03-24 22:29                       ` Felipe Contreras
2023-02-21 21:42   ` Emily Shaffer
2023-02-22  0:22     ` Junio C Hamano
2023-02-18  4:05 ` Elijah Newren [this message]
2023-02-21 22:06   ` Emily Shaffer
2023-02-22  8:23     ` Elijah Newren
2023-02-22 19:25     ` Jeff King
2023-02-21 19:09 ` Taylor Blau
2023-02-21 22:27   ` Emily Shaffer
2023-02-22  1:44 ` Victoria Dye
2023-02-25  1:48   ` Jonathan Tan
2023-02-22 14:55 ` Derrick Stolee
2023-02-24 21:06   ` Emily Shaffer
2023-03-23 23:37 ` Felipe Contreras
2023-03-23 23:44   ` rsbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BE6EA+vXLXJtn8CHO9pHJgLH_uh7_t7AYBRN2gAAA5C+Q@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avmikhailov@google.com \
    --cc=git@vger.kernel.org \
    --cc=jabolopes@google.com \
    --cc=jrn@google.com \
    --cc=nasamuffin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).