git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Elijah Newren <newren@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, jrnieder@gmail.com
Subject: Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format
Date: Wed, 16 Nov 2022 09:45:13 -0500	[thread overview]
Message-ID: <01063560-8f57-4e40-5707-f8d8ecfe6cca@github.com> (raw)
In-Reply-To: <CABPp-BFNvUQx7exLgqDvzhgn1s=xSFKbJWdr8qfxLTXEFDQQig@mail.gmail.com>

On 11/14/22 9:47 PM, Elijah Newren wrote:
> On Sun, Nov 13, 2022 at 4:07 PM Derrick Stolee <derrickstolee@github.com> wrote:
>>
>> On 11/11/22 6:28 PM, Elijah Newren wrote:
>>> On Mon, Nov 7, 2022 at 11:01 AM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>>>>
>>>> Introduction
>>>> ============
>>>>
>>>> I became interested in our packed-ref format based on the asymmetry between
>>>> ref updates and ref deletions: if we delete a packed ref, then the
>>>> packed-refs file needs to be rewritten. Compared to writing a loose ref,
>>>> this is an O(N) cost instead of O(1).
>>>>
>>>> In this way, I set out with some goals:
>>>>
>>>>  * (Primary) Make packed ref deletions be nearly as fast as loose ref
>>>>    updates.
>>>
>>> Performance is always nice.  :-)
>>>
>>>>  * (Secondary) Allow using a packed ref format for all refs, dropping loose
>>>>    refs and creating a clear way to snapshot all refs at a given point in
>>>>    time.
>>>
>>> Is this secondary goal the actual goal you have, or just the
>>> implementation by which you get the real underlying goal?
>>
>> To me, the primary goal takes precedence. It turns out that the best
>> way to solve for that goal happens to also make it possible to store
>> all refs in a packed form, because we can update the packed form
>> much faster than our current setup. There are alternatives that I
>> considered (and prototyped) that were more specific to the deletions
>> case, but they were not actually as fast as the stacked method. Those
>> alternatives also would never help reach the secondary goal, but I
>> probably would have considered them anyway if they were faster, if
>> only for their simplicity.
> 
> That's orthogonal to my question, though.  For your primary goal, you
> stated it in a form where it was obvious what benefit it would provide
> to end users.  Your secondary goal, as stated, didn't list any benefit
> to end users that I could see (update: reading the rest of your
> response it appears I just didn't understand it), so I was trying to
> guess at why your secondary goal might be a goal, i.e. what the real
> secondary goal was.

The reason is in the goal "creating a clear way to snapshot all refs
at a given point in time". This is a server-side benefit with no
visible benefit to users, immediately.

The D/F conflicts and case-sensitive parts that could fall from that
are not included in my goals. Part of that is because we would need a
new reflog format to complete that part. Let's take things one step
at a time and handle reflogs after we have ref update performance
handled.

>>> To me, it appears that such a capability would solve both (a) D/F
>>> conflict problems (i.e. the ability to simultaneously have a
>>> refs/heads/feature and refs/heads/feature/shiny ref), and (b) case
>>> sensitivity issues in refnames (i.e. inability of some users to work
>>> with both a refs/heads/feature and a refs/heads/FeAtUrE, due to
>>> constraints of their filesystem and the loose storage mechanism).  Are
>>> either of those the goal you are trying to achieve (I think both would
>>> be really nice, more so than the performance goal you have), or is
>>> there another?
>>
>> For a Git host provider, these D/F conflict and case-sensitivity
>> situations probably would need to stay as restrictions on the
>> server side for quite some time because we don't want users on
>> older Git clients to be unable to fetch a repository just because
>> we updated our ref storage to allow for such possibilities.
> 
> Okay, but even if not used on the server side, this capability could
> still be used on the client side and provide a big benefit to end
> users.
> 
> But I think there's a minor issue with what you stated; as far as I
> can tell, there is no case-sensitivity restriction on the server side
> for GitHub currently, and users do currently have problems cloning and
> using repositories with branches that differ in case only.  See e.g.
> https://github.com/newren/git-filter-repo/issues/48 and the multiple
> duplicates which reference that issue.  We've also had issues at
> $DAYJOB, though for GHE we added some hooks to deny creating branches
> that differ only in case from another branch to avoid the problem.

Yes, you're right here. We could do better in rejecting case-sensitive
matches upon request.

> Also, D/F restrictions on the server do not stop users from having D/F
> problems when fetching.  If users forget to use `--prune`, then when a
> refs/heads/foo has already been fetched is deleted and replaced by a
> refs/heads/foo/bar, then the user gets errors.  This issue actually
> caused a bit of a fire-drill for us just recently.

And similar to the case-sensitive situation, I'm not sure if we have
checks to avoid D/F conflicts if they happen across the loose/packed
boundary. We might just be using the filesystem as a constraint. I'll
need to dig in more here.

(This is all the more reason why this space is already complicated and
will take some time to unwind.)

> So both kinds of problems already exist, for users with any git client
> version (although the former only for users with unfortunate file
> systems).  And both problems cause pain.  Both issues are caused by
> loose refs, so limiting git storage to packed refs would fix both
> issues.
> 
>> The biggest benefit on the server side is actually for consistency
>> checks. Using a stacked packed-refs (especially with a tip file
>> that describes all of the layers) allows an atomic way to take a
>> snapshot of the refs and run a checksum operation on their values.
>> With loose refs, concurrent updates can modify the checksum during
>> its computation. This is a super niche reason for this, but it's
>> nice that the performance-only focus also ends up with a design
>> that satisfies this goal.
> 
> Ah...so this is the reason for your secondary goal?  Re-reading it
> looks like you did state this, I just missed it without the longer
> explanation.
> 
> Anyway, it might be worth calling out in your cover letter that there
> are (at least) three benefits to this secondary goal of yours -- the
> one you list here, plus the two I list above.

I suppose I assumed that the D/F and case conflicts were a "known"
benefit and a huge motivation of the reftable work. Instead of trying
to solve all of the ref problems at once, I wanted to focus on the
subset that I knew could be solved with a simpler solution, leaving
the full solution to later steps. It would help to be explicit about
how this direction helps solve this problem while also being clear
about how it does not solve it completely.

Thanks,
-Stolee

  reply	other threads:[~2022-11-16 14:46 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-07 18:35 [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 01/30] hashfile: allow skipping the hash function Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 02/30] read-cache: add index.computeHash config option Derrick Stolee via GitGitGadget
2022-11-11 23:31   ` Elijah Newren
2022-11-14 16:30     ` Derrick Stolee
2022-11-17 16:13   ` Ævar Arnfjörð Bjarmason
2022-11-07 18:35 ` [PATCH 03/30] extensions: add refFormat extension Derrick Stolee via GitGitGadget
2022-11-11 23:39   ` Elijah Newren
2022-11-16 14:37     ` Derrick Stolee
2022-11-07 18:35 ` [PATCH 04/30] config: fix multi-level bulleted list Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 05/30] repository: wire ref extensions to ref backends Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 06/30] refs: allow loose files without packed-refs Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 07/30] chunk-format: number of chunks is optional Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 08/30] chunk-format: document trailing table of contents Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 09/30] chunk-format: store chunk offset during write Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 10/30] chunk-format: allow trailing table of contents Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 11/30] chunk-format: parse " Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 12/30] refs: extract packfile format to new file Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 13/30] packed-backend: extract add_write_error() Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 14/30] packed-backend: extract iterator/updates merge Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 15/30] packed-backend: create abstraction for writing refs Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 16/30] config: add config values for packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 17/30] packed-backend: create shell of v2 writes Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 18/30] packed-refs: write file format version 2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 19/30] packed-refs: read file format v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 20/30] packed-refs: read optional prefix chunks Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 21/30] packed-refs: write " Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 22/30] packed-backend: create GIT_TEST_PACKED_REFS_VERSION Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 23/30] t1409: test with packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 24/30] t5312: allow packed-refs v2 format Derrick Stolee via GitGitGadget
2022-11-07 18:35 ` [PATCH 25/30] t5502: add PACKED_REFS_V1 prerequisite Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 26/30] t3210: require packed-refs v1 for some tests Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 27/30] t*: skip packed-refs v2 over http tests Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 28/30] ci: run GIT_TEST_PACKED_REFS_VERSION=2 in some builds Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 29/30] p1401: create performance test for ref operations Derrick Stolee via GitGitGadget
2022-11-07 18:36 ` [PATCH 30/30] refs: skip hashing when writing packed-refs v2 Derrick Stolee via GitGitGadget
2022-11-09 15:15 ` [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format Derrick Stolee
2022-11-11 23:28 ` Elijah Newren
2022-11-14  0:07   ` Derrick Stolee
2022-11-15  2:47     ` Elijah Newren
2022-11-16 14:45       ` Derrick Stolee [this message]
2022-11-17  4:28         ` Elijah Newren
2022-11-18 23:31     ` Junio C Hamano
2022-11-19  0:41       ` Elijah Newren
2022-11-19  3:00         ` Taylor Blau
2022-11-30 15:31       ` Derrick Stolee
2022-11-28 18:56 ` Han-Wen Nienhuys
2022-11-30 15:16   ` Derrick Stolee
2022-11-30 15:38     ` Phillip Wood
2022-11-30 16:37     ` Taylor Blau
2022-11-30 18:30     ` Han-Wen Nienhuys
2022-11-30 18:37       ` Sean Allred
2022-12-01 20:18       ` Derrick Stolee
2022-12-02 16:46         ` Han-Wen Nienhuys
2022-12-02 18:24           ` Ævar Arnfjörð Bjarmason
2022-11-30 22:55     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01063560-8f57-4e40-5707-f8d8ecfe6cca@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).