git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [TOPIC 1/12] Next-gen reference backends
Date: Mon, 2 Oct 2023 11:17:48 -0400	[thread overview]
Message-ID: <ZRrfHJYDEfdNO4Ma@nand.local> (raw)
In-Reply-To: <ZRregi3JJXFs4Msb@nand.local>

(Presenter: Patrick Steinhardt, Notetaker: Karthik Nayak)

* Summary: There have been multiple proposals for reference backends on the
  mailing list. Trying to converge to one solution.
* Problem: At GitLab we have certain repos with large amounts of references.
  Some repos have multi-million refs which causes scalability issues.
   * Current files backend uses a combination of loose files and packed-refs.
   * Deletion performance is bad.
   * Reference lookups are slow.
   * Storage space is also large.
   * There are some patches which improved the situation. e.g. skip-list for
     packed-refs by Taylor.
   * Atomic updates are currently not possible.
   * This is not an issue only faced by GitLab
* Two solutions proposed:
   * Reftables: Originally implemented by JGit (Shawn Pearce, 2017)
      * Google was storing the data in a table with one ref per row. This data
        was encrypted, which changes the ordering.
      * This led to realizing the ref storage itself was not optimal, so based
        on existing solutions at Google there was a proposal by Shawn and was
        implemented in JGit.
      * This solved the ref storage problem at Google.
      * The implementation in JGit by adoption was low because of compatibility
        requirement with CGit.
      * New patch series submitted which swaps out the packed-refs with
        ref-tables while keeping the existing file based loose-refs.
   * Incremental take on reference backend (aka. packed-refs v2) by Derrick
      * Uses pre-existing infrastructure in the git project. Makes it a more
        natural extension.
      * First part was to support a multi backend structure
      * Second part was packed references v2 in the Git project
* Question: How do we take it forward from here.
   * Emily: If the existing backend exists as a library. Might be easier to
     replace and experiment with.
      * Jeff: A lot of work in that direction has already been landed. But there
        is still some bleed of the implementation in other parts of the code.
        Might be messy to cleanup.
      * Patrick: Different implementations by different hosting providers with
        different requirements might cause issues for clients.[b]
   * Deletion performance is not the only issue faced (at GitLab) there are also
     deadlocks faced around this.
   * brian: If you have a large number of remote tracking refs you face the same
     perf issues.
   * Patrick: Any preference of which solution to go forward. GitLab is
     interested to pick this up and mostly going forward with reftables.
   * Reftables does support tombstoning, should solve the problem with multiple
     deletions.
      * There is still a problem with refs being a prefix of other refs.
   * Is there a world where loose refs are removed completely and replaced with
     reftables.
      * Debugging is much easier with loose refs, reftables is binary
        formatting. Might need additional tooling here. This is already proved
        to be working at Google.

  parent reply	other threads:[~2023-10-02 15:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
2023-10-02 15:17 ` [TOPIC 0/12] Welcome / Conservancy Update Taylor Blau
2023-10-02 15:17 ` Taylor Blau [this message]
2023-10-02 15:18 ` [TOPIC 02/12] Libification Goals and Progress Taylor Blau
2023-10-02 15:18 ` [TOPIC 3/12] Designing a Makefile for multiple libraries Taylor Blau
2023-10-02 15:19 ` [TOPIC 4/12] Scaling Git from a forge's perspective Taylor Blau
2023-10-02 15:19 ` [TOPIC 5/12] Replacing Git LFS using multiple promisor remotes Taylor Blau
2023-10-02 15:20 ` [TOPIC 6/12] Clarifying backwards compatibility and when we break it Taylor Blau
2023-10-02 15:21 ` [TOPIC 7/12] Authentication to new hosts without setup Taylor Blau
2023-10-02 15:21 ` [TOPIC 8/12] Update on jj, including at Google Taylor Blau
2023-10-02 15:21 ` [TOPIC 9/12] Code churn and cleanups Taylor Blau
2023-10-02 15:22 ` [TOPIC 10/12] Project management practices Taylor Blau
2023-10-02 15:22 ` [TOPIC 11/12] Improving new contributor on-boarding Taylor Blau
2023-10-02 15:22 ` [TOPIC 12/12] Overflow discussion Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRrfHJYDEfdNO4Ma@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).