git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Han-Wen Nienhuys <hanwen@google.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, jrnieder@gmail.com,
	John Cai <johncai86@gmail.com>
Subject: Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format
Date: Fri, 2 Dec 2022 17:46:51 +0100	[thread overview]
Message-ID: <CAFQ2z_NKpgsEsrDdkdp=HDajrzpUDjiUcUdR8TMkYpXZBU0k+g@mail.gmail.com> (raw)
In-Reply-To: <f5370fec-d517-eaa9-8e16-82fa20ac8532@github.com>

On Thu, Dec 1, 2022 at 9:19 PM Derrick Stolee <derrickstolee@github.com> wrote:
> >> The reason to start with this step is that the benefits and risks are
> >> clearly understood, which can motivate us to establish the mechanism for
> >> changing the ref format by defining the extension.
> >
> > I believe that the v2 format is a safe change with performance
> > improvements, but it's a backward incompatible format change with only
> > modest payoff. I also don't understand how it will help you do a stack
> > of tables,
> > which you need for your primary goal (ie. transactions/deletions
> > writing only the delta, rather than rewriting the whole file?).
>
> The v2 format doesn't help me on its own, but it has other benefits
> in terms of size and speed, as well as the "ref count" functionality.
>
> The important thing is that the definition of extensions.refFormat
> that I'm proposing in this RFC establishes a way to make incremental
> progress on the ref format, allowing the stacked format to come in
> later with less friction.

I guess you want to move the read/write stack under the loose storage
(packed backend), and introduce (read loose/packed + write packed
only) mode that is transitional?

Before you embark on this incremental route, I think it would be best
to think through carefully how an online upgrade would work in detail
(I think it's currently not specified?) If ultimately it's not
feasible to do incrementally, then the added complexity of the
incremental approach will be for naught.

The incremental mode would only be of interest to hosting providers.
It will only be used transitionally. It is inherently going to be
complex, because it has to consider both storage modes at the same
time, and because it is transitional, it will get less real life
testing. At the same time, the ref database is comparatively small, so
the availability blip that converting the storage offline will impair
is going to be small. So, the incremental approach is rather expensive
for a comparatively small benefit.

I also thought a bit about how you could make the transition seamless,
but I can't see a good way: you have to coordinate between tables.list
(the list of reftables active or whatever file signals the presence of
a stack) and files under refs/heads/. I don't know how to do
transactions across multiple files without cooperative locking.

If you assume you can use filesystem locks, then you could do
something simpler: if a git repository is marked 'transitional', git
processes take an FS read lock on .git/ .  The process that converts
the storage can take an exclusive (write) lock on .git/, so it knows
nobody will interfere. I think this only works if the repo is on local
disk rather than NFS, though.

> * Step 1: replace packed-refs with reftable.
> * Step 2: stop writing loose refs, only update reftable (but still read loose refs).

Does that work? A long running process might not notice the switch in
step 2, so it could still write a ref as loose, while another process
racing might write a different value to the same ref through reftable.

PS. I'll be away from work until Jan 9th.
-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Liana Sebastian

  reply	other threads:[~2022-12-02 16:47 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-07 18:35 [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 01/30] hashfile: allow skipping the hash function Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 02/30] read-cache: add index.computeHash config option Derrick Stolee via GitGitGadget
2022-11-11 23:31   ` Elijah Newren
2022-11-14 16:30     ` Derrick Stolee
2022-11-17 16:13   ` Ævar Arnfjörð Bjarmason
2022-11-07 18:35 ` [PATCH 03/30] extensions: add refFormat extension Derrick Stolee via GitGitGadget
2022-11-11 23:39   ` Elijah Newren
2022-11-16 14:37     ` Derrick Stolee
2022-11-07 18:35 ` [PATCH 04/30] config: fix multi-level bulleted list Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 05/30] repository: wire ref extensions to ref backends Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 06/30] refs: allow loose files without packed-refs Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 07/30] chunk-format: number of chunks is optional Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 08/30] chunk-format: document trailing table of contents Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 09/30] chunk-format: store chunk offset during write Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 10/30] chunk-format: allow trailing table of contents Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 11/30] chunk-format: parse " Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 12/30] refs: extract packfile format to new file Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 13/30] packed-backend: extract add_write_error() Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 14/30] packed-backend: extract iterator/updates merge Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 15/30] packed-backend: create abstraction for writing refs Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 16/30] config: add config values for packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 17/30] packed-backend: create shell of v2 writes Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 18/30] packed-refs: write file format version 2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 19/30] packed-refs: read file format v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 20/30] packed-refs: read optional prefix chunks Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 21/30] packed-refs: write " Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 22/30] packed-backend: create GIT_TEST_PACKED_REFS_VERSION Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 23/30] t1409: test with packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 24/30] t5312: allow packed-refs v2 format Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 25/30] t5502: add PACKED_REFS_V1 prerequisite Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 26/30] t3210: require packed-refs v1 for some tests Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 27/30] t*: skip packed-refs v2 over http tests Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 28/30] ci: run GIT_TEST_PACKED_REFS_VERSION=2 in some builds Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 29/30] p1401: create performance test for ref operations Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 30/30] refs: skip hashing when writing packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-09 15:15 ` [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format Derrick Stolee
2022-11-11 23:28 ` Elijah Newren
2022-11-14  0:07   ` Derrick Stolee
2022-11-15  2:47     ` Elijah Newren
2022-11-16 14:45       ` Derrick Stolee
2022-11-17  4:28         ` Elijah Newren
2022-11-18 23:31     ` Junio C Hamano
2022-11-19  0:41       ` Elijah Newren
2022-11-19  3:00         ` Taylor Blau
2022-11-30 15:31       ` Derrick Stolee
2022-11-28 18:56 ` Han-Wen Nienhuys
2022-11-30 15:16   ` Derrick Stolee
2022-11-30 15:38     ` Phillip Wood
2022-11-30 16:37     ` Taylor Blau
2022-11-30 18:30     ` Han-Wen Nienhuys
2022-11-30 18:37       ` Sean Allred
2022-12-01 20:18       ` Derrick Stolee
2022-12-02 16:46         ` Han-Wen Nienhuys [this message]
2022-12-02 18:24           ` Ævar Arnfjörð Bjarmason
2022-11-30 22:55     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFQ2z_NKpgsEsrDdkdp=HDajrzpUDjiUcUdR8TMkYpXZBU0k+g@mail.gmail.com' \
    --to=hanwen@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johncai86@gmail.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).