git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* suspected race between packing and fetch (single case study)
       [not found] <fe9babc8-a3ee-6be4-e4f8-9690cb7c79bd@fz-juelich.de>
@ 2021-01-08 16:39 ` Adina Wagner
  2021-01-08 18:48   ` Taylor Blau
  0 siblings, 1 reply; 7+ messages in thread
From: Adina Wagner @ 2021-01-08 16:39 UTC (permalink / raw)
  To: git

Hi,


colleagues encouraged me to report a "personal" bug I've stumbled
across. Its "personal", because I wasn't able to create a minimal
reproducer, or even reproduce it with the same script on other
infrastructure. We're suspecting a race between packing and fetch. The
script I am using is at the bottom of the email.

The script creates a joint Git/git-annex repository A with a large
number of objects. Afterwards, a repository B is created, and A is
cloned into it.
Cloning fails initially. Errors look like this:

+ git clone --progress ../A /tmp/B/subds
Cloning into '/tmp/B/subds'...
fatal: failed to copy file to
'/tmp/B/subds/.git/objects/44/93d6041a44b5a7280875ec9b6ecd78fbab7b6e':
No such file or directory

Running "ps aux -H | grep git" before and after cloning shows garbage
collection and packing processes in repo A. We're suspecting that there
is a race. Here is script output that shows the processes:
+ cd B
+ ps aux -H
+ grep git
adina     674763  0.0  0.0   6152   836 pts/5    S+   16:38
0:00           grep git
adina     674071  0.0  0.0   9584  2788 ?        Ss   16:38   0:00
/usr/lib/git-core/git gc --auto --no-quiet
adina     674072  0.0  0.0   9584  3884 ?        S    16:38   0:00
/usr/lib/git-core/git repack -d -l --no-write-bitmap-index
adina     674073  149  0.1 583780 20564 ?        R    16:38
0:02         /usr/lib/git-core/git pack-objects --local
--delta-base-offset .git/objects/pack/.tmp-674072-pack
--keep-true-parents --honor-pack-keep --non-empty --all --reflog
--indexed-objects --unpacked  --incremental
+ git clone --progress ../A /tmp/B/subds
Cloning into '/tmp/B/subds'...
fatal: failed to copy file to
'/tmp/B/subds/.git/objects/14/5a4c6775684788ecf51e5d745ac19ad5b204e3':
No such file or directory
+ ps aux -H
+ grep git
adina     674774  0.0  0.0   6152   896 pts/5    S+   16:38
0:00           grep git
adina     674071  0.0  0.0   9584  2788 ?        Ss   16:38   0:00
/usr/lib/git-core/git gc --auto --no-quiet
adina     674072 11.0  0.0  11160  3884 ?        R    16:38   0:00
/usr/lib/git-core/git repack -d -l --no-write-bitmap-index
bash script.sh  65.71s user 29.53s system 94% cpu 1:40.71 total



Both A and B are completely sane repositories, git fsck shows nothing
out of the ordinary, I can clone them fine in any situation but the
scripted workflow. If I add a short "sleep" between creating A and
cloning A into B the error vanishes.

I have been able to trigger this reliably for a month with the script. I
am running git version 2.29.2 (but also saw this when downgrading to
version 2.24) on Debian testing (bullseye). Other than simply waiting a
bit before the clone, setting git config --global gc.autodetach false
removes the bug, too.

I wonder if there is a way that Git could guard cases where background
gc processes may still be running?


For completeness, here is the script I am using to trigger this on my
machine. We didn't manage to reproduce the behavior on another machine,
and I didn't find a more minimal example (sorry :( ). The script
involves datalad (which uses git-annex):

#!/bin/sh

set -x

# this creates a joint git/git-annex repository
datalad create A && cd A
# this adds adds and extracts a tarball with ~13.000 JPEGs to the
repository. Data is added to git annex.
datalad download-url \
     --archive \
     --message "Download Imagenette dataset" \
     'https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz'
# this creates another joint git/git-annex repository
cd ../ && datalad create B
cd B
ps aux -H | grep git
git clone --progress ../A /tmp/B/subds
ps aux -H | grep git


Kind regards,
Adina



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suspected race between packing and fetch (single case study)
  2021-01-08 16:39 ` suspected race between packing and fetch (single case study) Adina Wagner
@ 2021-01-08 18:48   ` Taylor Blau
  2021-01-09 22:11     ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2021-01-08 18:48 UTC (permalink / raw)
  To: Adina Wagner; +Cc: git

Hi Adina,

On Fri, Jan 08, 2021 at 05:39:12PM +0100, Adina Wagner wrote:
> Hi,
>
>
> colleagues encouraged me to report a "personal" bug I've stumbled
> across. Its "personal", because I wasn't able to create a minimal
> reproducer, or even reproduce it with the same script on other
> infrastructure. We're suspecting a race between packing and fetch. The
> script I am using is at the bottom of the email.

Indeed, similar races between fetching and repacking are known. For
example, this discussion:

  https://lore.kernel.org/git/20200316082348.GA26581@inner.h.apk.li/

is about the .idx going away during a fetch. A similar thing is
happening here, but instead of the .idx file going away, your source
repository is repacking (and thus getting rid of loose object files).

Here, I think the issue is less complicated. Since you're cloning from a
local repository, the 'git clone' command calls 'clone_local()', which
in turn calls 'copy_or_link_directory()'. If the directory being copied
changes while being iterated over, the receiving end isn't guaranteed to
pick up the changes.

Worse, if the source _removes_ a file that hasn't yet been copied, over,
then the copy will fail, which is what you're seeing here.

One workaround would be to clone your repositories locally with
'--shared', which won't copy any objects from the source repository, but
instead mark its object store as an alternate to the newly created one.

> I wonder if there is a way that Git could guard cases where background
> gc processes may still be running?

Perhaps Git could take some sort of lock when writing to the object
store, but an flock wouldn't work since we'd want to allow multiple
readers to acquire the lock simultaneously, so long as there is no
writer.


Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suspected race between packing and fetch (single case study)
  2021-01-08 18:48   ` Taylor Blau
@ 2021-01-09 22:11     ` Junio C Hamano
  2021-01-11 19:25       ` Taylor Blau
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2021-01-09 22:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Adina Wagner, git

Taylor Blau <me@ttaylorr.com> writes:

> Here, I think the issue is less complicated. Since you're cloning from a
> local repository, the 'git clone' command calls 'clone_local()', which
> in turn calls 'copy_or_link_directory()'. If the directory being copied
> changes while being iterated over, the receiving end isn't guaranteed to
> pick up the changes.
>
> Worse, if the source _removes_ a file that hasn't yet been copied, over,
> then the copy will fail, which is what you're seeing here.

And the source that removes a file during a repack would create a
new file to keep the contents of the removed file available (if the
object still matters after the repack), but because we do not retry
our "cp -r" equivalent used in the clone_local(), we may not pick
such a new file up.

So, we probalby should document "git clone --local" that the user
should expect fallout similar to what may happen when they copy a
directory hierarchy with "cp -r src dst" and muck with what is in
"src" while the copy is ongoing.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suspected race between packing and fetch (single case study)
  2021-01-09 22:11     ` Junio C Hamano
@ 2021-01-11 19:25       ` Taylor Blau
  2021-01-12 17:46         ` yoh
  0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2021-01-11 19:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Adina Wagner, git

On Sat, Jan 09, 2021 at 02:11:55PM -0800, Junio C Hamano wrote:
> So, we probalby should document "git clone --local" that the user
> should expect fallout similar to what may happen when they copy a
> directory hierarchy with "cp -r src dst" and muck with what is in
> "src" while the copy is ongoing.

Mm, good idea. Below the cut line is a patch to do just that. I thought
briefly about documenting it in the pack-protocol page, but it only
mentions the local transport in passing, so it seemed inappropriate to
add that much more detail there.

--- 8< ---

Subject: [PATCH] Documentation/git-clone.txt: document race with --local

When running 'git clone --local', the operation may fail if another
process is modifying the source repository. Document that this race
condition is known to hopefully help anyone who may run into it.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-clone.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 876aedcd47..02d9c19cec 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -57,6 +57,10 @@ repository is specified as a URL, then this flag is ignored (and we
 never use the local optimizations).  Specifying `--no-local` will
 override the default when `/path/to/repo` is given, using the regular
 Git transport instead.
++
+*NOTE*: this operation can race with concurrent modification to the
+source repository, similar to running `cp -r src dst` while modifying
+`src`.

 --no-hardlinks::
 	Force the cloning process from a repository on a local
--
2.30.0.138.g6d7191ea01


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: suspected race between packing and fetch (single case study)
  2021-01-11 19:25       ` Taylor Blau
@ 2021-01-12 17:46         ` yoh
  2021-01-12 18:47           ` Taylor Blau
  0 siblings, 1 reply; 7+ messages in thread
From: yoh @ 2021-01-12 17:46 UTC (permalink / raw)
  To: git


On Mon, 11 Jan 2021, Taylor Blau wrote:
> ++
> +*NOTE*: this operation can race with concurrent modification to the
> +source repository, similar to running `cp -r src dst` while modifying
> +`src`.

Couldn't `gc` be triggered by git in seemingly read-only operations,
thus possibly ruining the analogy with `cp` while doing `rm` (explicit
intent to modify)?

Moreover, situation is also a bit different since a sane user script
would not place `rm` into background to keep operating on original
source right before doing `cp` -- and that is what is happening here:

`git` operation is presumably complete (but leaves `gc` running in the
background) and script advances to the next step only to run into a race
condition with that preceding `git` command which apparently triggered
`gc`.  Should then any script which operates on local `git` repositories
not to forget to add   -c gc.autodetach=0  for every git
invocation which might be potentially effected?

Cheers,
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
WWW:   http://www.linkedin.com/in/yarik        


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suspected race between packing and fetch (single case study)
  2021-01-12 17:46         ` yoh
@ 2021-01-12 18:47           ` Taylor Blau
  2021-01-13 14:55             ` yoh
  0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2021-01-12 18:47 UTC (permalink / raw)
  To: yoh; +Cc: git

On Tue, Jan 12, 2021 at 12:46:22PM -0500, yoh@onerussian.com wrote:
>
> On Mon, 11 Jan 2021, Taylor Blau wrote:
> > ++
> > +*NOTE*: this operation can race with concurrent modification to the
> > +source repository, similar to running `cp -r src dst` while modifying
> > +`src`.
>
> Couldn't `gc` be triggered by git in seemingly read-only operations,
> thus possibly ruining the analogy with `cp` while doing `rm` (explicit
> intent to modify)?
>
> Moreover, situation is also a bit different since a sane user script
> would not place `rm` into background to keep operating on original
> source right before doing `cp` -- and that is what is happening here:

If you're suggesting that something is missing from the above patch, I'm
not sure I quite understand what you would like added.

All of these (background gc, explicit rm-ing) fall under the category of
"concurrent modification": they are changing the source directory in
some way while a read operation is taking place.

> `git` operation is presumably complete (but leaves `gc` running in the
> background) and script advances to the next step only to run into a race
> condition with that preceding `git` command which apparently triggered
> `gc`.  Should then any script which operates on local `git` repositories
> not to forget to add   -c gc.autodetach=0  for every git
> invocation which might be potentially effected?

If your workflow is that you are frequently cloning via the local
transport and there is no other synchronization going on between
whatever work is happening in the source repository, then yes. (But note
of course that you can set gc.autodetach=0 via the source repository's
.git/config rather than typing it each time).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suspected race between packing and fetch (single case study)
  2021-01-12 18:47           ` Taylor Blau
@ 2021-01-13 14:55             ` yoh
  0 siblings, 0 replies; 7+ messages in thread
From: yoh @ 2021-01-13 14:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Adina Wagner


On Tue, 12 Jan 2021, Taylor Blau wrote:
> > > ++
> > > +*NOTE*: this operation can race with concurrent modification to the
> > > +source repository, similar to running `cp -r src dst` while modifying
> > > +`src`.

> > Couldn't `gc` be triggered by git in seemingly read-only operations,
> > thus possibly ruining the analogy with `cp` while doing `rm` (explicit
> > intent to modify)?

> > Moreover, situation is also a bit different since a sane user script
> > would not place `rm` into background to keep operating on original
> > source right before doing `cp` -- and that is what is happening here:

> If you're suggesting that something is missing from the above patch, I'm
> not sure I quite understand what you would like added.

Slept on it.  I think your patch (doc disclaimer) is factually correct
and probably as good as it can get.  Not yet sure if it is worth
explicit mentioning `gc` or `repack` as one of such concurrent
operations.

> All of these (background gc, explicit rm-ing) fall under the category of
> "concurrent modification": they are changing the source directory in
> some way while a read operation is taking place.

yes.  My comment was more on how such modifications are triggered: via
explicit actions (e.g. `rm`) intended to modify vs as a "house
keeping running in the background", which is the case of gc in
particular when triggered by seemingly read-only operations.

> > `git` operation is presumably complete (but leaves `gc` running in the
> > background) and script advances to the next step only to run into a race
> > condition with that preceding `git` command which apparently triggered
> > `gc`.  Should then any script which operates on local `git` repositories
> > not to forget to add   -c gc.autodetach=0  for every git
> > invocation which might be potentially effected?

> If your workflow is that you are frequently cloning via the local
> transport and there is no other synchronization going on between
> whatever work is happening in the source repository, then yes. (But note
> of course that you can set gc.autodetach=0 via the source repository's
> .git/config rather than typing it each time).

IMHO it affects efficiency, become cumbersome (for git users), and thus
might be error-prone: e.g.  gc.autodetach=0 is necessity only to
mitigate only for a possible subsequent `clone` invocation operating
locally.  Higher level constructs siting on top of `git` would not know
what is the next command ran in the user script (like in our case of
datalad) to set such config variable for their invocations.  Adding
gc.autodetach=0 to every single `git` invocation would effect our
efficiency. User might not be made aware of such necessity for using
`git clone` on local repositories, only after having their scripts
deployed and at some random points in time start hitting the race
condition and go "google" and RTFM mode to figure out what is
going on.

That is why I am more in-line with your initial comment  in
https://lore.kernel.org/git/X%2FipCPFyW3gAWrHo@nand.local/ :

> Perhaps Git could take some sort of lock when writing to the object
> store, but an flock wouldn't work since we'd want to allow multiple
> readers to acquire the lock simultaneously, so long as there is no
> writer.

I think it would be nice to have `clone_local()` first check that
there is no ongoing modifications happening  before proceeding and wait
some reasonable amount of time (up to ?0 sec?) if still ongoing, and
then fail "informatively" if still cannot clone.  Even though it would
not prevent race condition in full (`clone_local` might check and
initiate, and then some process starts altering while `clone_local` is
ongoing), it would mitigate any scripted cases of a local `git clone`
following some heavy manipulations of original repository which triggers
background gc.

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
WWW:   http://www.linkedin.com/in/yarik        


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-13 14:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fe9babc8-a3ee-6be4-e4f8-9690cb7c79bd@fz-juelich.de>
2021-01-08 16:39 ` suspected race between packing and fetch (single case study) Adina Wagner
2021-01-08 18:48   ` Taylor Blau
2021-01-09 22:11     ` Junio C Hamano
2021-01-11 19:25       ` Taylor Blau
2021-01-12 17:46         ` yoh
2021-01-12 18:47           ` Taylor Blau
2021-01-13 14:55             ` yoh

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).