From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [TOPIC 1/12] Next-gen reference backends
Date: Mon, 2 Oct 2023 11:17:48 -0400 [thread overview]
Message-ID: <ZRrfHJYDEfdNO4Ma@nand.local> (raw)
In-Reply-To: <ZRregi3JJXFs4Msb@nand.local>
(Presenter: Patrick Steinhardt, Notetaker: Karthik Nayak)
* Summary: There have been multiple proposals for reference backends on the
mailing list. Trying to converge to one solution.
* Problem: At GitLab we have certain repos with large amounts of references.
Some repos have multi-million refs which causes scalability issues.
* Current files backend uses a combination of loose files and packed-refs.
* Deletion performance is bad.
* Reference lookups are slow.
* Storage space is also large.
* There are some patches which improved the situation. e.g. skip-list for
packed-refs by Taylor.
* Atomic updates are currently not possible.
* This is not an issue only faced by GitLab
* Two solutions proposed:
* Reftables: Originally implemented by JGit (Shawn Pearce, 2017)
* Google was storing the data in a table with one ref per row. This data
was encrypted, which changes the ordering.
* This led to realizing the ref storage itself was not optimal, so based
on existing solutions at Google there was a proposal by Shawn and was
implemented in JGit.
* This solved the ref storage problem at Google.
* The implementation in JGit by adoption was low because of compatibility
requirement with CGit.
* New patch series submitted which swaps out the packed-refs with
ref-tables while keeping the existing file based loose-refs.
* Incremental take on reference backend (aka. packed-refs v2) by Derrick
* Uses pre-existing infrastructure in the git project. Makes it a more
natural extension.
* First part was to support a multi backend structure
* Second part was packed references v2 in the Git project
* Question: How do we take it forward from here.
* Emily: If the existing backend exists as a library. Might be easier to
replace and experiment with.
* Jeff: A lot of work in that direction has already been landed. But there
is still some bleed of the implementation in other parts of the code.
Might be messy to cleanup.
* Patrick: Different implementations by different hosting providers with
different requirements might cause issues for clients.[b]
* Deletion performance is not the only issue faced (at GitLab) there are also
deadlocks faced around this.
* brian: If you have a large number of remote tracking refs you face the same
perf issues.
* Patrick: Any preference of which solution to go forward. GitLab is
interested to pick this up and mostly going forward with reftables.
* Reftables does support tombstoning, should solve the problem with multiple
deletions.
* There is still a problem with refs being a prefix of other refs.
* Is there a world where loose refs are removed completely and replaced with
reftables.
* Debugging is much easier with loose refs, reftables is binary
formatting. Might need additional tooling here. This is already proved
to be working at Google.
next prev parent reply other threads:[~2023-10-02 15:19 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
2023-10-02 15:17 ` [TOPIC 0/12] Welcome / Conservancy Update Taylor Blau
2023-10-02 15:17 ` Taylor Blau [this message]
2023-10-02 15:18 ` [TOPIC 02/12] Libification Goals and Progress Taylor Blau
2023-10-02 15:18 ` [TOPIC 3/12] Designing a Makefile for multiple libraries Taylor Blau
2023-10-02 15:19 ` [TOPIC 4/12] Scaling Git from a forge's perspective Taylor Blau
2023-10-02 15:19 ` [TOPIC 5/12] Replacing Git LFS using multiple promisor remotes Taylor Blau
2023-10-02 15:20 ` [TOPIC 6/12] Clarifying backwards compatibility and when we break it Taylor Blau
2023-10-02 15:21 ` [TOPIC 7/12] Authentication to new hosts without setup Taylor Blau
2023-10-02 15:21 ` [TOPIC 8/12] Update on jj, including at Google Taylor Blau
2023-10-02 15:21 ` [TOPIC 9/12] Code churn and cleanups Taylor Blau
2023-10-02 15:22 ` [TOPIC 10/12] Project management practices Taylor Blau
2023-10-02 15:22 ` [TOPIC 11/12] Improving new contributor on-boarding Taylor Blau
2023-10-02 15:22 ` [TOPIC 12/12] Overflow discussion Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRrfHJYDEfdNO4Ma@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).