From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [TOPIC 02/12] Libification Goals and Progress
Date: Mon, 2 Oct 2023 11:18:15 -0400 [thread overview]
Message-ID: <ZRrfN2lbg14IOLiK@nand.local> (raw)
In-Reply-To: <ZRregi3JJXFs4Msb@nand.local>
(Presenter: Emily Shaffer, Notetaker: Taylor Blau)
* The effort is to isolate some parts of Git into smaller, independently
buildable libraries. Can unit test it, swap out implementations, etc.
* Calvin Wan has been working on extracting a common set of interfaces, refining
the types, etc. This is in pursuit of a "standard library" implementation for
Git. Close to being shippable.
* Josh Steadmon spent some time in the second half of the year suggesting a unit
testing framework in order to test the library interfaces beyond our standard
shell tests.
* Goals:
* Google has a couple of ways to proceed with their libification effort.
Community input is solicited:
* Interfaces for VFS / callable by IDE integration to avoid shelling out
* Target libification for the sake of Git itself. Code clean-up, making
the code more predictable / testable. Example being submodules, which
are messy and difficult to reason about. References backend, etc.
* Is there an appetite for libification? Some particular component that would
especially benefit from clean-up, being made more test-able, hot-swappable,
etc.
* (From Emily's comment above) If others are implementing the basic references
backend via a different implementation, how do we make sure that we are
building compatible parts? Goal would be to have Git's unit tests pass against
a different API.
* (Patrick Steinhardt) For reference backends especially: would like to be able
to split between "policy" and "mechanism". This would avoid the issue
discussed in the last session where different e.g. refs backend
implementations have different behavior.
* Emily: white-box tests for the API to make sure that different
implementations meet the policy
* (Jonathan Nieder) For reference backends in particular, the current
implementation has an odd "layering" scheme - packed-refs today is an
incomplete backend using the same interface as the complete "loose and packed
refs" backend, serves as a mechanism without fulfilling the policy
requirements. The approach above seems like a positive change.
* (Emily) Are also looking into a similar project around the object store, but
have found that it is deeply intertwined throughout the rest of the code base.
Difficult to reason about, even without a library interface. Can we make any
given change safely?
* Hunch is that it is still useful to target that sort of thing, even if
there aren't clear boundaries.
* In the interim, can still be part of the same compilation unit, just
creating a clearer boundary.
* (Emily) For hosting providers and others building things on top of git, are
there parts of git functionality that you'd like to have libified so you can
get benefits without having to wait for feature lag?
* (brian) not interested in using Git as a library in GitHub's codebase because
of license incompatibility. Would like to experiment with different algorithms
for packing and delta-fication in Rust as a highly parallel system. Would be
nice to be able to swap out something that is C-compatible. Have been able to
make changes in libgit2 while causing libgit2 to segfault, doesn't want to
write more segfaults.
* (Taylor) There's an effort going on in GitHub to reduce our dependency on
libgit2, precisely for the feature lag reason Emily mentions. I don't think
we're planning on using it as a library soon, but we rely on the Git
command-line interface through fork/exec
* (Emily) Is licensing the only obstacle to using Git as a library, or are there
other practical concerns?
* (Jeff Hostetler) Pulled libgit2-sharp out of Visual Studio. Issues with
crashing, but also running into classical issues with large repositories.
Memory consumption was a real issue at the time. Safer to have memory
segmented across multiple processes so that processes can't touch other
processes memory space.
* (Emily) Interesting: thinks that performance overhead would outweigh the
memory issues.
* (Patrick) To reiterate from GitLab's point of view: we are in the same boat as
Microsoft and GitHub. Have used libgit2 extensively in the past, but was able
to drop support last month. No plans to use Git as a library in the future.
Having a process boundary is useful, avoids memory leaks, bugs in Git spilling
out to GitLab. Still have an "upstream-first" policy. Benefits everybody by
spreading the maintenance burden and ensuring that others can benefit from
such functionality.
* (Emily) If we had the capacity to write portions of Git's code in Rust (memory
safety, performance, use it as a library), would we want to use it?
* (Junio) I notice in the participant list people like Randall who work on
NonStop. I'd worry about the effect on minority stakeholders, portability.
* (Junio) Not fundamentally opposed to the direction.
* (Elijah) did not parallelize the C implementation of the new ORT backend.
Wanted to rewrite it in Rust, cleaned up headers as a side-effect, and looked
at other bits. Merge backends are already pluggable, could have a "normal" one
in addition to a Rust backend.
* (Emily) If we already have something in C that establishes an existing API
boundary, that makes it more tenable to rewrite it in Rust. Could say that the
C version is deprecated and make future changes to Rust.
* (brian) Thinks they would be in favor of that; is personally happy to say that
operating systems need to accept support for bottom languages eventually. All
of the main Debian architectures in use have Rust ports. They are portable to
all of the main architectures. Would make it easier to do unit testing. Could
add parallelization and optimization without worrying about race conditions,
which would be a benefit. Is happy to implement unit tests with Rust's nicer
ecosystem.
* (Taylor) Is it just NonStop?
* (Elijah) Randall mentioned that they have a contractual agreement that is
supposed to expire at some point
(https://lore.kernel.org/git/004601d8ed6b$13a2f580$3ae8e080$@nexbridge.com/).
Could we have a transition plan that:
* Keeps NonStop users happy until their contract expires.
* Allows the rest of us to get up to speed with Rust.
* (Jonathan Nieder) doing this in a "self-contained module" mode with fallback C
implementation gives us the opportunity to back out in the future (at least in
the early periods while we're still learning).
* (Jonathan Tan) back to process isolation: is the short lifetime of the process
important?
* (Taylor Blau) seems like an impossible goal to be able to do multi-command
executions in a single process, the code is just not designed for it.
* (Junio) is anybody using the `git cat-file --batch-command` mode that switches
between batch and batch-check.
* (Patrick Steinhardt) they are longer lived, but only "middle" long-lived.
GitLab limits the maximum runtime, on the order of ~minutes, at which point
they are reaped.
* (Taylor Blau) lots of issues besides memory leaks that would become an issue
* (Jeff Hostetler) would be nice to keep memory-hungry components pinned across
multiple command-equivalents.
* (Taylor Blau): same issue as reading configuration.
next prev parent reply other threads:[~2023-10-02 15:19 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
2023-10-02 15:17 ` [TOPIC 0/12] Welcome / Conservancy Update Taylor Blau
2023-10-02 15:17 ` [TOPIC 1/12] Next-gen reference backends Taylor Blau
2023-10-02 15:18 ` Taylor Blau [this message]
2023-10-02 15:18 ` [TOPIC 3/12] Designing a Makefile for multiple libraries Taylor Blau
2023-10-02 15:19 ` [TOPIC 4/12] Scaling Git from a forge's perspective Taylor Blau
2023-10-02 15:19 ` [TOPIC 5/12] Replacing Git LFS using multiple promisor remotes Taylor Blau
2023-10-02 15:20 ` [TOPIC 6/12] Clarifying backwards compatibility and when we break it Taylor Blau
2023-10-02 15:21 ` [TOPIC 7/12] Authentication to new hosts without setup Taylor Blau
2023-10-02 15:21 ` [TOPIC 8/12] Update on jj, including at Google Taylor Blau
2023-10-02 15:21 ` [TOPIC 9/12] Code churn and cleanups Taylor Blau
2023-10-02 15:22 ` [TOPIC 10/12] Project management practices Taylor Blau
2023-10-02 15:22 ` [TOPIC 11/12] Improving new contributor on-boarding Taylor Blau
2023-10-02 15:22 ` [TOPIC 12/12] Overflow discussion Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRrfN2lbg14IOLiK@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).