git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>, git@vger.kernel.org
Subject: Re: Pathological performance with git remote rename and many tracking refs
Date: Sat, 16 Apr 2022 13:23:28 +0200	[thread overview]
Message-ID: <220416.86h76t9s41.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <xmqqfsmedzsp.fsf@gitster.g>


On Fri, Apr 15 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I actually wonder if this wouldn't be that hard, because if we have any
>> reflogs we could simply "replay" them by turning (again, I'm a bit rusty
>> on the exact lock dance);
>>
>>     lock(refs/remotes/origin/master);
>>     lock(refs/remotes/def/master);
>>     create(refs/remotes/def/master, refs/remotes/origin/master^{});
>
> Why deref?  Trying to be prepared for seeing a symbolic ref, or
> something?

I just meant "$(git rev-parse $NAME)", sorry about being unclear. It
would work for branches, but yeah, for annotated tag objects it would be
the wrong thing.

>>     delete(refs/remotes/origin/master);
>>     unlock(refs/remotes/origin/master);
>>     unlock(refs/remotes/def/master);
>>
>> Into instead doing:
>>
>>     lock(refs/remotes/origin/master);
>>     lock(refs/remotes/def/master);
>>     for from, to, msg ref_update(refs/remotes/origin/master):
>>         update(refs/remotes/def/master, from, to, msg);
>>     delete(refs/remotes/origin/master);
>>     unlock(refs/remotes/origin/master);
>>     unlock(refs/remotes/def/master);
>
> Would this (or the above, for that matter) work well when renaming 'foo'
> to 'foo/bar' (or the other way around), I wonder?

Hrm, probably not. Also for that we'll have tricky collisions with the
2nd level (master) v.s. 1st (origin/) won't we?

> Is the reasoning behind "replay each and every reflog entry" because
> you are trying to avoid depending on the implementation detail (i.e.
> "R=.git/logs/refs/remotes; mv $R/origin/master $R/def/master" works
> only with the files backend)?

It's a suggested workaround around the ref backend not understanding how
to "plug in" someting like a "mv", i.e. we can represent this sort of
thing as a sequence of normal ref operations.

> Unless we are willing to add a new
> ref backend method to help this natively, it would be a workable but
> an ultra-slow way to do so, as it would involve open, write, fsync,
> and close for each reflog entry.
>
>     ... goes and looks ...

It's far from optimal, but I think the main slowdown is due to the
(re)writing of the packed refs file, whereas the churn of doing a bunch
of locks, append-only writes etc. should be small by comparison, I
haven't benchmarked it though...

Also for my own reflogs (especially of remotes) some have very busy
logs, and some just 1-2 entries...

> Hmph, I am utterly confused.  refs/files-backend.c eventually
> dispatches to files_copy_or_rename_ref() when rename_ref method in
> ref_storage_be is used on the files backend, and the function
> already renames the reflog file without having to copy entry by
> entry.  We cannot just open a transaction, run many rename_refs and
> close it?

The whole problem (such as it is) is that that isn't part of the
transaction mechanism.

I.e. all of the logic to make this failure tolerant is missing, you're
renaming N refs and their logs, and we fail in the middle of it, does it
roll back to a consistent state? (it doesn't).

Of course this code could be made to handle such failures, address
consistency issues etc., but doing so would essentially be a
re-implementation of transactions.

So I'm wondering (perhaps unproductively) if having some way to
piggy-back the mechanism into the transaction mechanism in an easy way
would be a good way to do it...

      reply	other threads:[~2022-04-16 11:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 22:32 Pathological performance with git remote rename and many tracking refs brian m. carlson
2022-04-14  7:12 ` Ævar Arnfjörð Bjarmason
2022-04-15  1:08   ` brian m. carlson
2022-04-15 12:26     ` Ævar Arnfjörð Bjarmason
2022-04-15 17:25       ` Junio C Hamano
2022-04-16 11:23         ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220416.86h76t9s41.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).