Concurrent fetch commands

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Concurrent fetch commands
@ 2023-12-31 13:30 Stefan Haller
  2023-12-31 13:48 ` Dragan Simic
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Stefan Haller @ 2023-12-31 13:30 UTC (permalink / raw
  To: git

Currently, git doesn't seem to be very good at handling two concurrent
invocations of git fetch (or git fetch and git pull). This is a problem
because it is common for git clients to run git fetch periodically in
the background. In that case, when you happen to invoke git pull while
such a background fetch is running, an error occurs ("Cannot rebase onto
multiple branches").

I can reliably reproduce this by doing

   $ git fetch&; sleep 0.1; git pull
   [1] 42160
   [1]  + done       git fetch
   fatal: Cannot rebase onto multiple branches.

The reason for this failure seems to be that both the first fetch and
the fetch that runs as part of the pull append their information to
.git/FETCH_HEAD, so that the information for the current branch ends up
twice in the file.

Do you think git fetch should be made more robust against scenarios like
this?

More context: the git client that I'm contributing to (lazygit) used to
guard against this for its own background fetch with a global mutex that
allowed only one single fetch, pull, or push at a time. This solved the
problem nicely for lazygit's own operations (at the expense of some lag,
occasionally); and I'm not aware of any reports about failures because
some other git client's background fetch got in the way, so maybe we
don't have to worry about that too much.

However, we now removed that mutex to allow certain parallel fetch
operations to run at the same time, most notably fetching (and updating)
a branch that is not checked out (by doing "git fetch origin
branch:branch"). It is useful to be able to trigger this for multiple
branches concurrently, and actually this works fine.

But now we have the problem described above, where a pull of the
checked-out branch runs at the same time as a background fetch; this is
not so unlikely, because lazygit triggers the first background fetch at
startup, so invoking the pull command right after starting lazygit is
very likely to fail.

We could re-introduce a mutex and just make it a little less global;
e.g. protect only pull and parameter-less fetch. But fixing it in git
itself seems preferable to me.

Sorry for the wall of text, but I figured giving more context could be
useful.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2023-12-31 13:30 Concurrent fetch commands Stefan Haller
@ 2023-12-31 13:48 ` Dragan Simic
  2023-12-31 13:50 ` Konstantin Tokarev
  2023-12-31 17:27 ` Junio C Hamano
  2 siblings, 0 replies; 20+ messages in thread
From: Dragan Simic @ 2023-12-31 13:48 UTC (permalink / raw
  To: Stefan Haller; +Cc: git

On 2023-12-31 14:30, Stefan Haller wrote:
> Currently, git doesn't seem to be very good at handling two concurrent
> invocations of git fetch (or git fetch and git pull). This is a problem
> because it is common for git clients to run git fetch periodically in
> the background. In that case, when you happen to invoke git pull while
> such a background fetch is running, an error occurs ("Cannot rebase 
> onto
> multiple branches").
> 
> I can reliably reproduce this by doing
> 
>    $ git fetch&; sleep 0.1; git pull
>    [1] 42160
>    [1]  + done       git fetch
>    fatal: Cannot rebase onto multiple branches.
> 
> The reason for this failure seems to be that both the first fetch and
> the fetch that runs as part of the pull append their information to
> .git/FETCH_HEAD, so that the information for the current branch ends up
> twice in the file.
> 
> Do you think git fetch should be made more robust against scenarios 
> like
> this?

I believe a similar issue has been already raised recently, so perhaps 
introducing some kind of file-based locking within git itself could be 
justified.  It would make the things a bit more robust, and would also 
improve the overall user experience.

> More context: the git client that I'm contributing to (lazygit) used to
> guard against this for its own background fetch with a global mutex 
> that
> allowed only one single fetch, pull, or push at a time. This solved the
> problem nicely for lazygit's own operations (at the expense of some 
> lag,
> occasionally); and I'm not aware of any reports about failures because
> some other git client's background fetch got in the way, so maybe we
> don't have to worry about that too much.
> 
> However, we now removed that mutex to allow certain parallel fetch
> operations to run at the same time, most notably fetching (and 
> updating)
> a branch that is not checked out (by doing "git fetch origin
> branch:branch"). It is useful to be able to trigger this for multiple
> branches concurrently, and actually this works fine.
> 
> But now we have the problem described above, where a pull of the
> checked-out branch runs at the same time as a background fetch; this is
> not so unlikely, because lazygit triggers the first background fetch at
> startup, so invoking the pull command right after starting lazygit is
> very likely to fail.
> 
> We could re-introduce a mutex and just make it a little less global;
> e.g. protect only pull and parameter-less fetch. But fixing it in git
> itself seems preferable to me.
> 
> Sorry for the wall of text, but I figured giving more context could be
> useful.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2023-12-31 13:30 Concurrent fetch commands Stefan Haller
  2023-12-31 13:48 ` Dragan Simic
@ 2023-12-31 13:50 ` Konstantin Tokarev
  2023-12-31 14:01   ` Dragan Simic
  2024-01-01 11:23   ` Stefan Haller
  2023-12-31 17:27 ` Junio C Hamano
  2 siblings, 2 replies; 20+ messages in thread
From: Konstantin Tokarev @ 2023-12-31 13:50 UTC (permalink / raw
  To: Stefan Haller; +Cc: git

В Sun, 31 Dec 2023 14:30:05 +0100
Stefan Haller <lists@haller-berlin.de> пишет:

> Currently, git doesn't seem to be very good at handling two concurrent
> invocations of git fetch (or git fetch and git pull). This is a
> problem because it is common for git clients to run git fetch
> periodically in the background.

I think it's a really weird idea which makes a disservice both to human
git users and git servers. IMO clients doing this should be fixed to
avoid such behavior, at least by default.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2023-12-31 13:50 ` Konstantin Tokarev
@ 2023-12-31 14:01   ` Dragan Simic
  2024-01-01 11:23   ` Stefan Haller
  1 sibling, 0 replies; 20+ messages in thread
From: Dragan Simic @ 2023-12-31 14:01 UTC (permalink / raw
  To: Konstantin Tokarev; +Cc: Stefan Haller, git

On 2023-12-31 14:50, Konstantin Tokarev wrote:
> В Sun, 31 Dec 2023 14:30:05 +0100
> Stefan Haller <lists@haller-berlin.de> пишет:
> 
>> Currently, git doesn't seem to be very good at handling two concurrent
>> invocations of git fetch (or git fetch and git pull). This is a
>> problem because it is common for git clients to run git fetch
>> periodically in the background.
> 
> I think it's a really weird idea which makes a disservice both to human
> git users and git servers. IMO clients doing this should be fixed to
> avoid such behavior, at least by default.

Could you, please, elaborate a bit on the resulting disservice?  
Regarding fixing the clients, IDE users often expect such automatic 
updating to be performed in the background, so I'm afraid there's not 
much wiggle room there.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2023-12-31 13:30 Concurrent fetch commands Stefan Haller
  2023-12-31 13:48 ` Dragan Simic
  2023-12-31 13:50 ` Konstantin Tokarev
@ 2023-12-31 17:27 ` Junio C Hamano
  2023-12-31 17:41   ` Dragan Simic
                     ` (2 more replies)
  2 siblings, 3 replies; 20+ messages in thread
From: Junio C Hamano @ 2023-12-31 17:27 UTC (permalink / raw
  To: Stefan Haller; +Cc: git

Stefan Haller <lists@haller-berlin.de> writes:

> I can reliably reproduce this by doing
>
>    $ git fetch&; sleep 0.1; git pull
>    [1] 42160
>    [1]  + done       git fetch
>    fatal: Cannot rebase onto multiple branches.

I see a bug here.

How this _ought_ to work is

 - The first "git fetch" wants to report what it fetched by writing
   into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
   the fetch finishes can consume its contents).

 - The second "git pull" runs "git fetch" under the hood.  Because
   it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
   is already somebody writing to the file, it should notice and
   barf, saying "fatal: a 'git fetch' is already working" or
   something.

But because there is no "Do not overwrite FETCH_HEAD somebody else
is using" protection, "git merge" or "git rebase" that is run as the
second half of the "git pull" ends up working on the contents of
FETCH_HEAD that is undefined, and GIGO result follows.

The "bug" that the second "git fetch" does not notice an already
running one (who is in possession of FETCH_HEAD) and refrain from
starting is not easy to design a fix for---we cannot just abort by
opening it with O_CREAT|O_EXCL because it is a normal thing for
$GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
its contents before starting to avoid getting affected by contents
leftover by the last fetch, but when there is a "git fetch" that is
actively running, and it finishes _after_ the second one starts and
truncates the file, the second one will end up seeing the contents
the first one left.  We have the "--no-write-fetch-head" option for
users to explicitly tell which invocation of "git fetch" should not
write FETCH_HEAD.

Running "background/priming" fetches (the one before "sleep 0.1" you
have) is not a crime by itself, but it is a crime to run them
without the "--no-fetch-head" option.  Since you have *NO* intention
of using its contents to feed a "git merge" (or equivalent)
yourself, you are breaking your "git pull" step in your example
reproduction yourself.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2023-12-31 17:27 ` Junio C Hamano
@ 2023-12-31 17:41   ` Dragan Simic
  2024-01-01 11:30   ` Stefan Haller
  2024-01-03  7:33   ` Patrick Steinhardt
  2 siblings, 0 replies; 20+ messages in thread
From: Dragan Simic @ 2023-12-31 17:41 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Stefan Haller, git

On 2023-12-31 18:27, Junio C Hamano wrote:
> Stefan Haller <lists@haller-berlin.de> writes:
> 
>> I can reliably reproduce this by doing
>> 
>>    $ git fetch&; sleep 0.1; git pull
>>    [1] 42160
>>    [1]  + done       git fetch
>>    fatal: Cannot rebase onto multiple branches.
> 
> I see a bug here.
> 
> How this _ought_ to work is
> 
>  - The first "git fetch" wants to report what it fetched by writing
>    into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
>    the fetch finishes can consume its contents).
> 
>  - The second "git pull" runs "git fetch" under the hood.  Because
>    it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
>    is already somebody writing to the file, it should notice and
>    barf, saying "fatal: a 'git fetch' is already working" or
>    something.
> 
> But because there is no "Do not overwrite FETCH_HEAD somebody else
> is using" protection, "git merge" or "git rebase" that is run as the
> second half of the "git pull" ends up working on the contents of
> FETCH_HEAD that is undefined, and GIGO result follows.
> 
> The "bug" that the second "git fetch" does not notice an already
> running one (who is in possession of FETCH_HEAD) and refrain from
> starting is not easy to design a fix for---we cannot just abort by
> opening it with O_CREAT|O_EXCL because it is a normal thing for
> $GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
> its contents before starting to avoid getting affected by contents
> leftover by the last fetch, but when there is a "git fetch" that is
> actively running, and it finishes _after_ the second one starts and
> truncates the file, the second one will end up seeing the contents
> the first one left.  We have the "--no-write-fetch-head" option for
> users to explicitly tell which invocation of "git fetch" should not
> write FETCH_HEAD.
> 
> Running "background/priming" fetches (the one before "sleep 0.1" you
> have) is not a crime by itself, but it is a crime to run them
> without the "--no-fetch-head" option.  Since you have *NO* intention
> of using its contents to feed a "git merge" (or equivalent)
> yourself, you are breaking your "git pull" step in your example
> reproduction yourself.

Thank you very much for this highly detailed explanation.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2023-12-31 13:50 ` Konstantin Tokarev
  2023-12-31 14:01   ` Dragan Simic
@ 2024-01-01 11:23   ` Stefan Haller
  2024-01-01 15:47     ` Federico Kircheis
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Haller @ 2024-01-01 11:23 UTC (permalink / raw
  To: Konstantin Tokarev; +Cc: git

On 31.12.23 14:50, Konstantin Tokarev wrote:
> В Sun, 31 Dec 2023 14:30:05 +0100
> Stefan Haller <lists@haller-berlin.de> пишет:
> 
>> ... it is common for git clients to run git fetch
>> periodically in the background.
> 
> I think it's a really weird idea which makes a disservice both to human
> git users and git servers. IMO clients doing this should be fixed to
> avoid such behavior, at least by default.

I disagree pretty strongly. Users expect this, and I do find it very
convenient myself.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2023-12-31 17:27 ` Junio C Hamano
  2023-12-31 17:41   ` Dragan Simic
@ 2024-01-01 11:30   ` Stefan Haller
  2024-01-01 11:42     ` Stefan Haller
  2024-01-03  7:33   ` Patrick Steinhardt
  2 siblings, 1 reply; 20+ messages in thread
From: Stefan Haller @ 2024-01-01 11:30 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On 31.12.23 18:27, Junio C Hamano wrote:
> How this _ought_ to work is
> 
>    ... it should notice and
>    barf, saying "fatal: a 'git fetch' is already working" or
>    something.

Interesting, I had expected this to work somehow, e.g. by sequencing the
operations or whatever is necessary to make it work. Fixing the bug like
you suggest actually makes very little difference in practice, it just
gives a slightly less confusing error message.

> but it is a crime to run them without the "--no-fetch-head" option.

Ouch, I wasn't aware we are committing crimes. We'll accept the
punishment. :-)

But it does sound like --no-write-fetch-head will solve our problem,
thanks!

-Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-01 11:30   ` Stefan Haller
@ 2024-01-01 11:42     ` Stefan Haller
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Haller @ 2024-01-01 11:42 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On 01.01.24 12:30, Stefan Haller wrote:
> But it does sound like --no-write-fetch-head will solve our problem,
> thanks!

Actually we saw another problem recently that I suspect might also be
caused by the concurrency between fetch and pull, but I'm not sure. I'll
explain it here in case anybody has any idea.

What happened was that a user tried to pull a branch that was behind its
upstream (not diverged). They got the error message from
show_advice_pull_non_ff ("Need to specify how to reconcile divergent
branches"); the log then showed that the background fetch was ongoing
for the remote of the current branch while they initiated the pull.

Trying to pull again after the background fetch had moved on to the next
remote then worked.

I read the code in pull.c a bit, but I can't see how it can become so
confused about being diverged in this scenario. Any ideas?

-Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-01 11:23   ` Stefan Haller
@ 2024-01-01 15:47     ` Federico Kircheis
  0 siblings, 0 replies; 20+ messages in thread
From: Federico Kircheis @ 2024-01-01 15:47 UTC (permalink / raw
  To: git

On 01/01/2024 12.23, Stefan Haller wrote:
> On 31.12.23 14:50, Konstantin Tokarev wrote:
>> В Sun, 31 Dec 2023 14:30:05 +0100
>> Stefan Haller <lists@haller-berlin.de> пишет:
>>
>>> ... it is common for git clients to run git fetch
>>> periodically in the background.
>>
>> I think it's a really weird idea which makes a disservice both to human
>> git users and git servers. IMO clients doing this should be fixed to
>> avoid such behavior, at least by default.
> 
> I disagree pretty strongly. Users expect this, and I do find it very
> convenient myself.
> 

Especially when working with git worktree.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2023-12-31 17:27 ` Junio C Hamano
  2023-12-31 17:41   ` Dragan Simic
  2024-01-01 11:30   ` Stefan Haller
@ 2024-01-03  7:33   ` Patrick Steinhardt
  2024-01-03  8:11     ` Patrick Steinhardt
  2 siblings, 1 reply; 20+ messages in thread
From: Patrick Steinhardt @ 2024-01-03  7:33 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Stefan Haller, git

[-- Attachment #1: Type: text/plain, Size: 3123 bytes --]

On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:
> Stefan Haller <lists@haller-berlin.de> writes:
> 
> > I can reliably reproduce this by doing
> >
> >    $ git fetch&; sleep 0.1; git pull
> >    [1] 42160
> >    [1]  + done       git fetch
> >    fatal: Cannot rebase onto multiple branches.
> 
> I see a bug here.
> 
> How this _ought_ to work is
> 
>  - The first "git fetch" wants to report what it fetched by writing
>    into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
>    the fetch finishes can consume its contents).
> 
>  - The second "git pull" runs "git fetch" under the hood.  Because
>    it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
>    is already somebody writing to the file, it should notice and
>    barf, saying "fatal: a 'git fetch' is already working" or
>    something.
> 
> But because there is no "Do not overwrite FETCH_HEAD somebody else
> is using" protection, "git merge" or "git rebase" that is run as the
> second half of the "git pull" ends up working on the contents of
> FETCH_HEAD that is undefined, and GIGO result follows.
> 
> The "bug" that the second "git fetch" does not notice an already
> running one (who is in possession of FETCH_HEAD) and refrain from
> starting is not easy to design a fix for---we cannot just abort by
> opening it with O_CREAT|O_EXCL because it is a normal thing for
> $GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
> its contents before starting to avoid getting affected by contents
> leftover by the last fetch, but when there is a "git fetch" that is
> actively running, and it finishes _after_ the second one starts and
> truncates the file, the second one will end up seeing the contents
> the first one left.  We have the "--no-write-fetch-head" option for
> users to explicitly tell which invocation of "git fetch" should not
> write FETCH_HEAD.

While I agree that it's the right thing to use "--no-write-fetch-head"
in this context, I still wonder whether we want to fix this "bug". It
would be a rather easy change on our side to start using the lockfile
API to write to FETCH_HEAD, which has a bunch of benefits:

  - We would block concurrent processes of writing to FETCH_HEAD at the
    same time (well, at least for clients aware of the new semantics).

  - Consequentially, we do not write a corrupted FETCH_HEAD anymore when
    multiple processes write to it at the same time.

  - We're also more robust against corruption in the context of hard
    crashes due to atomic rename semantics and proper flushing.

I don't really see much of a downside except for the fact that we change
how this special ref is being written to, so other implementations would
need to adapt accordingly. But even if they didn't, if clients with both
the new and old behaviour write FETCH_HEAD at the same point in time the
result would still be a consistent FETCH_HEAD if both writes finish. We
do have a race now which of both versions of FETCH_HEAD we see, but that
feels better than corrupted contents to me.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-03  7:33   ` Patrick Steinhardt
@ 2024-01-03  8:11     ` Patrick Steinhardt
       [not found]       ` <ZZU1TCyQdLqoLxPw@ugly>
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick Steinhardt @ 2024-01-03  8:11 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Stefan Haller, git

[-- Attachment #1: Type: text/plain, Size: 3619 bytes --]

On Wed, Jan 03, 2024 at 08:33:24AM +0100, Patrick Steinhardt wrote:
> On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:
> > Stefan Haller <lists@haller-berlin.de> writes:
> > 
> > > I can reliably reproduce this by doing
> > >
> > >    $ git fetch&; sleep 0.1; git pull
> > >    [1] 42160
> > >    [1]  + done       git fetch
> > >    fatal: Cannot rebase onto multiple branches.
> > 
> > I see a bug here.
> > 
> > How this _ought_ to work is
> > 
> >  - The first "git fetch" wants to report what it fetched by writing
> >    into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
> >    the fetch finishes can consume its contents).
> > 
> >  - The second "git pull" runs "git fetch" under the hood.  Because
> >    it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
> >    is already somebody writing to the file, it should notice and
> >    barf, saying "fatal: a 'git fetch' is already working" or
> >    something.
> > 
> > But because there is no "Do not overwrite FETCH_HEAD somebody else
> > is using" protection, "git merge" or "git rebase" that is run as the
> > second half of the "git pull" ends up working on the contents of
> > FETCH_HEAD that is undefined, and GIGO result follows.
> > 
> > The "bug" that the second "git fetch" does not notice an already
> > running one (who is in possession of FETCH_HEAD) and refrain from
> > starting is not easy to design a fix for---we cannot just abort by
> > opening it with O_CREAT|O_EXCL because it is a normal thing for
> > $GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
> > its contents before starting to avoid getting affected by contents
> > leftover by the last fetch, but when there is a "git fetch" that is
> > actively running, and it finishes _after_ the second one starts and
> > truncates the file, the second one will end up seeing the contents
> > the first one left.  We have the "--no-write-fetch-head" option for
> > users to explicitly tell which invocation of "git fetch" should not
> > write FETCH_HEAD.
> 
> While I agree that it's the right thing to use "--no-write-fetch-head"
> in this context, I still wonder whether we want to fix this "bug". It
> would be a rather easy change on our side to start using the lockfile
> API to write to FETCH_HEAD, which has a bunch of benefits:
> 
>   - We would block concurrent processes of writing to FETCH_HEAD at the
>     same time (well, at least for clients aware of the new semantics).
> 
>   - Consequentially, we do not write a corrupted FETCH_HEAD anymore when
>     multiple processes write to it at the same time.
> 
>   - We're also more robust against corruption in the context of hard
>     crashes due to atomic rename semantics and proper flushing.
> 
> I don't really see much of a downside except for the fact that we change
> how this special ref is being written to, so other implementations would
> need to adapt accordingly. But even if they didn't, if clients with both
> the new and old behaviour write FETCH_HEAD at the same point in time the
> result would still be a consistent FETCH_HEAD if both writes finish. We
> do have a race now which of both versions of FETCH_HEAD we see, but that
> feels better than corrupted contents to me.

Ah, one thing I didn't think of is parallel fetches. It's expected that
all of the fetches write into FETCH_HEAD at the same point in time
concurrently, so the end result is the collection of updated refs even
though the order isn't guaranteed. That makes things a bit more
complicated indeed.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
       [not found]       ` <ZZU1TCyQdLqoLxPw@ugly>
@ 2024-01-03 10:40         ` Patrick Steinhardt
  2024-01-03 16:40           ` Taylor Blau
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick Steinhardt @ 2024-01-03 10:40 UTC (permalink / raw
  To: Oswald Buddenhagen; +Cc: Junio C Hamano, Stefan Haller, git

[-- Attachment #1: Type: text/plain, Size: 2292 bytes --]

On Wed, Jan 03, 2024 at 11:22:04AM +0100, Oswald Buddenhagen wrote:
> On Wed, Jan 03, 2024 at 09:11:07AM +0100, Patrick Steinhardt wrote:
> > Ah, one thing I didn't think of is parallel fetches. It's expected that
> > all of the fetches write into FETCH_HEAD at the same point in time
> > concurrently
> > 
> is it, though? given that the contents could be already randomly scrambled,
> it would not seem particularly bad if the behavior changed.
> 
> the one real complication i see is the --append option, which requires using
> a waiting lock after the actual fetch, rather than acquiring it immediately
> and erroring out on failure (and ideally giving a hint to use
> --no-write-fetch-head).

I should probably clarify, but with "parallel fetches" I meant `git
fetch --jobs=`, not two separate executions of git-fetch(1). And these
do in fact use `--append` internally: the main process first truncates
FETCH_HEAD and then spawns its children, which will then append to
FETCH_HEAD in indeterministic order.

But even though the order is indeterministic, I wouldn't go as far as
claiming that the complete feature is broken. It works and records all
updated refs in FETCH_HEAD just fine, even if it's not particularly
elegant. Which to me shows that we should try hard not to break it.

> an extra complication is that concurrent runs with and without --append
> should be precluded, because that would again result in undefined behavior.
> it generally seems tricky to get --append straight if parallel fetches are
> supposed to work.

Yeah, the `--append` flag indeed complicates things. There are two ways
to handle this:

  - `--append` should refrain from running when there is a lockfile.
    This breaks `git fetch --jobs` without extra infra to handle this
    case, and furthermore a user may (rightfully?) expect that two
    manually spawned `git fetch --append` processes should work just
    fine.

  - `--append` should handle concurrency just fine, that is it knows to
    append to a preexisting lockfile. This is messy though, and the
    original creator of the lockfile wouldn't know when it can commit it
    into place.

Both options are kind of ugly, so I'm less sure now whether lockfiles
are the way to go.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-03 10:40         ` Patrick Steinhardt
@ 2024-01-03 16:40           ` Taylor Blau
  2024-01-03 22:10             ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Taylor Blau @ 2024-01-03 16:40 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: Oswald Buddenhagen, Junio C Hamano, Stefan Haller, git

On Wed, Jan 03, 2024 at 11:40:51AM +0100, Patrick Steinhardt wrote:
>   - `--append` should handle concurrency just fine, that is it knows to
>     append to a preexisting lockfile. This is messy though, and the
>     original creator of the lockfile wouldn't know when it can commit it
>     into place.
>
> Both options are kind of ugly, so I'm less sure now whether lockfiles
> are the way to go.

Interesting. Thinking a little bit about what you wrote here, I feel
like `--append[=<FETCH_HEAD>] would do what you need here. The creator
of the lockfile would commit it into place exactly when all children
have finished writing into the existing lockfile.

It seems like that could work, but I haven't poked around to figure out
whether or not that is the case. Regardless, supposing that it does
work, I wonder what users reasonably expect in the presence of multiple
'git fetch' operations. I suppose the answer is that they expect
concurrent fetches to be tolerated, but that the contents of FETCH_HEAD
(and of course the remote references) are consistent at the end of all
of the fetches.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-03 16:40           ` Taylor Blau
@ 2024-01-03 22:10             ` Junio C Hamano
  2024-01-04 12:01               ` Stefan Haller
  2024-01-04 17:34               ` Taylor Blau
  0 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2024-01-03 22:10 UTC (permalink / raw
  To: Taylor Blau; +Cc: Patrick Steinhardt, Oswald Buddenhagen, Stefan Haller, git

Taylor Blau <me@ttaylorr.com> writes:

> ... I suppose the answer is that they expect
> concurrent fetches to be tolerated, but that the contents of FETCH_HEAD
> (and of course the remote references) are consistent at the end of all
> of the fetches.

What does it mean to be "consistent" in this case, though?  For the
controlled form of multiple fetches performed by "git fetch --all",
the answer is probably "as if we fetched sequentially from these
remotes, one by one, and concatenated what these individual fetch
invocations left in FETCH_HEAD".  But for an uncontrolled background
fetch IDE and others perform behind user's back, it is unclear what
it means, or for that matter, it is dubious if there is a reasonable
definition for the word.

Folks who invented "git maintenance" designed their "prefetch" task
to perform the best practice, without interfering any foreground
fetches by not touching FETCH_HEAD and the remote-tracking branches.

Nobody brought up the latter so far on this discussion thread, but
mucking with the remote-tracking branches behind user's back means
completely breaking the end-user expectation that --force-with-lease
would do something useful even when it is not given the commit the
user expects to see at the remote.  Perhaps those third-party tools
that want to run "git fetch" in the background can learn from how
"prefetch" task works to avoid the breakage they are inflicting on
their users?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-03 22:10             ` Junio C Hamano
@ 2024-01-04 12:01               ` Stefan Haller
  2024-01-04 20:54                 ` Mike Hommey
  2024-01-04 17:34               ` Taylor Blau
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Haller @ 2024-01-04 12:01 UTC (permalink / raw
  To: Junio C Hamano, Taylor Blau; +Cc: Patrick Steinhardt, Oswald Buddenhagen, git

On 03.01.24 23:10, Junio C Hamano wrote:
> Folks who invented "git maintenance" designed their "prefetch" task
> to perform the best practice, without interfering any foreground
> fetches by not touching FETCH_HEAD and the remote-tracking branches.

That's good, but it's for a very different purpose than an IDE's
background fetch. git maintenance's prefetch is just to improve
performance for the next pull; the point of an IDE's background fetch is
to show me which of my remote branches have new stuff that I might be
interested in pulling, without having to fetch myself. So I *want* this
to be mucking with my remote-tracking branches.

> Nobody brought up the latter so far on this discussion thread, but
> mucking with the remote-tracking branches behind user's back means
> completely breaking the end-user expectation that --force-with-lease
> would do something useful even when it is not given the commit the
> user expects to see at the remote.  

That's an interesting point that indeed hasn't been brought up yet.
However, don't we all agree that --force-with-lease without specifying a
commit is not a good idea anyway, in general? That's why
--force-if-includes was invented, isn't it?

> Perhaps those third-party tools
> that want to run "git fetch" in the background can learn from how
> "prefetch" task works to avoid the breakage they are inflicting on
> their users?

Again, what you call "breakage" here is the very point of a background
fetch for me, so I don't want it to be avoided. (For FETCH_HEAD, yes,
and I learned that we have --no-write-fetch-head for that, but for
remote-tracking branches, no.)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-03 22:10             ` Junio C Hamano
  2024-01-04 12:01               ` Stefan Haller
@ 2024-01-04 17:34               ` Taylor Blau
  1 sibling, 0 replies; 20+ messages in thread
From: Taylor Blau @ 2024-01-04 17:34 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Patrick Steinhardt, Oswald Buddenhagen, Stefan Haller, git

On Wed, Jan 03, 2024 at 02:10:56PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > ... I suppose the answer is that they expect
> > concurrent fetches to be tolerated, but that the contents of FETCH_HEAD
> > (and of course the remote references) are consistent at the end of all
> > of the fetches.
>
> What does it mean to be "consistent" in this case, though?  For the
> controlled form of multiple fetches performed by "git fetch --all",
> the answer is probably "as if we fetched sequentially from these
> remotes, one by one, and concatenated what these individual fetch
> invocations left in FETCH_HEAD".  But for an uncontrolled background
> fetch IDE and others perform behind user's back, it is unclear what
> it means, or for that matter, it is dubious if there is a reasonable
> definition for the word.

Yeah, on thinking on it more I tend to agree here.

> Nobody brought up the latter so far on this discussion thread, but
> mucking with the remote-tracking branches behind user's back means
> completely breaking the end-user expectation that --force-with-lease
> would do something useful even when it is not given the commit the
> user expects to see at the remote.  Perhaps those third-party tools
> that want to run "git fetch" in the background can learn from how
> "prefetch" task works to avoid the breakage they are inflicting on
> their users?

Probably so.

Thanks,
Taylor


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-04 12:01               ` Stefan Haller
@ 2024-01-04 20:54                 ` Mike Hommey
  2024-01-04 22:14                   ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Hommey @ 2024-01-04 20:54 UTC (permalink / raw
  To: Stefan Haller
  Cc: Junio C Hamano, Taylor Blau, Patrick Steinhardt,
	Oswald Buddenhagen, git

On Thu, Jan 04, 2024 at 01:01:26PM +0100, Stefan Haller wrote:
> On 03.01.24 23:10, Junio C Hamano wrote:
> > Folks who invented "git maintenance" designed their "prefetch" task
> > to perform the best practice, without interfering any foreground
> > fetches by not touching FETCH_HEAD and the remote-tracking branches.
> 
> That's good, but it's for a very different purpose than an IDE's
> background fetch. git maintenance's prefetch is just to improve
> performance for the next pull; the point of an IDE's background fetch is
> to show me which of my remote branches have new stuff that I might be
> interested in pulling, without having to fetch myself. So I *want* this
> to be mucking with my remote-tracking branches.

Use `git remote update`?

Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-04 20:54                 ` Mike Hommey
@ 2024-01-04 22:14                   ` Junio C Hamano
  2024-01-04 22:25                     ` Mike Hommey
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2024-01-04 22:14 UTC (permalink / raw
  To: Mike Hommey
  Cc: Stefan Haller, Taylor Blau, Patrick Steinhardt,
	Oswald Buddenhagen, git

Mike Hommey <mh@glandium.org> writes:

> On Thu, Jan 04, 2024 at 01:01:26PM +0100, Stefan Haller wrote:
>> On 03.01.24 23:10, Junio C Hamano wrote:
>> > Folks who invented "git maintenance" designed their "prefetch" task
>> > to perform the best practice, without interfering any foreground
>> > fetches by not touching FETCH_HEAD and the remote-tracking branches.
>> 
>> That's good, but it's for a very different purpose than an IDE's
>> background fetch. git maintenance's prefetch is just to improve
>> performance for the next pull; the point of an IDE's background fetch is
>> to show me which of my remote branches have new stuff that I might be
>> interested in pulling, without having to fetch myself. So I *want* this
>> to be mucking with my remote-tracking branches.
>
> Use `git remote update`?

Hmph, it seems that it does not pass "--no-write-fetch-head" so it
would interfere with the foreground "fetch" or "pull" the end user
consciously makes, exactly the same way Stefan's demonstration in
the first message in the thread, no?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Concurrent fetch commands
  2024-01-04 22:14                   ` Junio C Hamano
@ 2024-01-04 22:25                     ` Mike Hommey
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Hommey @ 2024-01-04 22:25 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Stefan Haller, Taylor Blau, Patrick Steinhardt,
	Oswald Buddenhagen, git

On Thu, Jan 04, 2024 at 02:14:50PM -0800, Junio C Hamano wrote:
> Mike Hommey <mh@glandium.org> writes:
> 
> > On Thu, Jan 04, 2024 at 01:01:26PM +0100, Stefan Haller wrote:
> >> On 03.01.24 23:10, Junio C Hamano wrote:
> >> > Folks who invented "git maintenance" designed their "prefetch" task
> >> > to perform the best practice, without interfering any foreground
> >> > fetches by not touching FETCH_HEAD and the remote-tracking branches.
> >> 
> >> That's good, but it's for a very different purpose than an IDE's
> >> background fetch. git maintenance's prefetch is just to improve
> >> performance for the next pull; the point of an IDE's background fetch is
> >> to show me which of my remote branches have new stuff that I might be
> >> interested in pulling, without having to fetch myself. So I *want* this
> >> to be mucking with my remote-tracking branches.
> >
> > Use `git remote update`?
> 
> Hmph, it seems that it does not pass "--no-write-fetch-head" so it
> would interfere with the foreground "fetch" or "pull" the end user
> consciously makes, exactly the same way Stefan's demonstration in
> the first message in the thread, no?

Interesting, I never realized it updated FETCH_HEAD, but it does. All
the entries are marked "not-for-merge", though, whatever that implies.

Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-01-04 22:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-31 13:30 Concurrent fetch commands Stefan Haller
2023-12-31 13:48 ` Dragan Simic
2023-12-31 13:50 ` Konstantin Tokarev
2023-12-31 14:01   ` Dragan Simic
2024-01-01 11:23   ` Stefan Haller
2024-01-01 15:47     ` Federico Kircheis
2023-12-31 17:27 ` Junio C Hamano
2023-12-31 17:41   ` Dragan Simic
2024-01-01 11:30   ` Stefan Haller
2024-01-01 11:42     ` Stefan Haller
2024-01-03  7:33   ` Patrick Steinhardt
2024-01-03  8:11     ` Patrick Steinhardt
     [not found]       ` <ZZU1TCyQdLqoLxPw@ugly>
2024-01-03 10:40         ` Patrick Steinhardt
2024-01-03 16:40           ` Taylor Blau
2024-01-03 22:10             ` Junio C Hamano
2024-01-04 12:01               ` Stefan Haller
2024-01-04 20:54                 ` Mike Hommey
2024-01-04 22:14                   ` Junio C Hamano
2024-01-04 22:25                     ` Mike Hommey
2024-01-04 17:34               ` Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).