git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org
Subject: Re: Pathological performance with git remote rename and many tracking refs
Date: Fri, 15 Apr 2022 14:26:16 +0200	[thread overview]
Message-ID: <220415.868rs6cyal.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YljFnJk55WYLKd6Y@camp.crustytoothpaste.net>


On Fri, Apr 15 2022, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> On 2022-04-14 at 07:12:20, Ævar Arnfjörð Bjarmason wrote:
>> Aside from how we'd do renames with transactions, do you know about
>> clone.defaultRemoteName and --origin?
>
> Yes, I do know about that.  However, in my case, the repository is
> cloned before I even get a chance to touch it, so these options have no
> effect.  My dotfiles aren't even on the machine at that point.

*nod*

>> There was a (small) thread as a follow-up to that "rename --progress"
>> patch at the time, did you spot/read that?:
>> https://lore.kernel.org/git/220302.865yow6u8a.gmgdl@evledraar.gmail.com/
>
> Yeah, I remember reading it at the time.
>
>> More generally, probably:
>> 
>>  1. Teach transactions about N operations on the same refname, which
>>     they'll currently die on, renames are one case.
>> 
>>  2. Be able to "hook in" to them, updating reflogs is one special-case,
>>     but we have the same inherent issue with updating config in lockstep
>>     with transactions.
>
> Yeah, that's one of the reasons I was thinking of implementing
> --no-reflogs: because in that case, the operation really is a
> create/delete operation and it doesn't require any additional logic in
> the transaction.
>
> My goal here is specifically not to rearchitect ref transactions and to
> implement a relatively simple solution.  Your response is indicating to
> me that updating reflogs is going to be the former and not the latter.

Yes, I think it would be nicer to solve the general problem, because
we'd otherwise end up with no-reflogs and have-reflogs paths through
these functions.

I'd also think you'd be stuck trying to instrument transactions in some
new way anyway, because wouldn't this thing be racy if reflogs for one
of the remotes appeared after you started you'd need to abort or deal
with them?

Or was the plan to:

 1. Create lock files for them all
 2. No reflogs?
 3. Rename..

Maybe that was what we did in any case, I'm rather rusty on the exact
procedure at the moment. I.e. did we do the mass-lock first?

Anyway.

I actually wonder if this wouldn't be that hard, because if we have any
reflogs we could simply "replay" them by turning (again, I'm a bit rusty
on the exact lock dance);

    lock(refs/remotes/origin/master);
    lock(refs/remotes/def/master);
    create(refs/remotes/def/master, refs/remotes/origin/master^{});
    delete(refs/remotes/origin/master);
    unlock(refs/remotes/origin/master);
    unlock(refs/remotes/def/master);

Into instead doing:

    lock(refs/remotes/origin/master);
    lock(refs/remotes/def/master);
    for from, to, msg ref_update(refs/remotes/origin/master):
    update(refs/remotes/def/master, from, to, msg);
    delete(refs/remotes/origin/master);
    unlock(refs/remotes/origin/master);
    unlock(refs/remotes/def/master);

I.e. simply replay each individual item in the reflog, basically using
the code in your builtin/stash.c series.

I didn't think of that before, and just thought of some
extra-transactional way to copy the reflog over as-is.

I think that multi-copy approach would run into
ref_update_reject_duplicates(), i.e. you can't have multiple updates
now, but that's a sanity assertion to make sure the transaction is
internally consistent.

In this case the N update() calls are semnatically the same as a
create() call in terms of aborting the transaction, i.e. we'd "fold"
them away, i.e. turn the sanity check into pretending N such updates
were the same as a single create(), so we'd delete on rollback.

Maybe that would work? Hrm...

Anyway, depending on the answer to if your --no-reflogs plan would have
any lock issues etc, perhaps an alternate plan would be:

    git remote rename --unsafe

Where we'd just "mv" the 1-2 relevant refs/remotes/X and
log/refs/remotes/X folders, followed by updating refs...

  reply	other threads:[~2022-04-15 12:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 22:32 Pathological performance with git remote rename and many tracking refs brian m. carlson
2022-04-14  7:12 ` Ævar Arnfjörð Bjarmason
2022-04-15  1:08   ` brian m. carlson
2022-04-15 12:26     ` Ævar Arnfjörð Bjarmason [this message]
2022-04-15 17:25       ` Junio C Hamano
2022-04-16 11:23         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220415.868rs6cyal.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).