* suspected race between packing and fetch (single case study) [not found] <fe9babc8-a3ee-6be4-e4f8-9690cb7c79bd@fz-juelich.de> @ 2021-01-08 16:39 ` Adina Wagner 2021-01-08 18:48 ` Taylor Blau 0 siblings, 1 reply; 7+ messages in thread From: Adina Wagner @ 2021-01-08 16:39 UTC (permalink / raw) To: git Hi, colleagues encouraged me to report a "personal" bug I've stumbled across. Its "personal", because I wasn't able to create a minimal reproducer, or even reproduce it with the same script on other infrastructure. We're suspecting a race between packing and fetch. The script I am using is at the bottom of the email. The script creates a joint Git/git-annex repository A with a large number of objects. Afterwards, a repository B is created, and A is cloned into it. Cloning fails initially. Errors look like this: + git clone --progress ../A /tmp/B/subds Cloning into '/tmp/B/subds'... fatal: failed to copy file to '/tmp/B/subds/.git/objects/44/93d6041a44b5a7280875ec9b6ecd78fbab7b6e': No such file or directory Running "ps aux -H | grep git" before and after cloning shows garbage collection and packing processes in repo A. We're suspecting that there is a race. Here is script output that shows the processes: + cd B + ps aux -H + grep git adina 674763 0.0 0.0 6152 836 pts/5 S+ 16:38 0:00 grep git adina 674071 0.0 0.0 9584 2788 ? Ss 16:38 0:00 /usr/lib/git-core/git gc --auto --no-quiet adina 674072 0.0 0.0 9584 3884 ? S 16:38 0:00 /usr/lib/git-core/git repack -d -l --no-write-bitmap-index adina 674073 149 0.1 583780 20564 ? R 16:38 0:02 /usr/lib/git-core/git pack-objects --local --delta-base-offset .git/objects/pack/.tmp-674072-pack --keep-true-parents --honor-pack-keep --non-empty --all --reflog --indexed-objects --unpacked --incremental + git clone --progress ../A /tmp/B/subds Cloning into '/tmp/B/subds'... fatal: failed to copy file to '/tmp/B/subds/.git/objects/14/5a4c6775684788ecf51e5d745ac19ad5b204e3': No such file or directory + ps aux -H + grep git adina 674774 0.0 0.0 6152 896 pts/5 S+ 16:38 0:00 grep git adina 674071 0.0 0.0 9584 2788 ? Ss 16:38 0:00 /usr/lib/git-core/git gc --auto --no-quiet adina 674072 11.0 0.0 11160 3884 ? R 16:38 0:00 /usr/lib/git-core/git repack -d -l --no-write-bitmap-index bash script.sh 65.71s user 29.53s system 94% cpu 1:40.71 total Both A and B are completely sane repositories, git fsck shows nothing out of the ordinary, I can clone them fine in any situation but the scripted workflow. If I add a short "sleep" between creating A and cloning A into B the error vanishes. I have been able to trigger this reliably for a month with the script. I am running git version 2.29.2 (but also saw this when downgrading to version 2.24) on Debian testing (bullseye). Other than simply waiting a bit before the clone, setting git config --global gc.autodetach false removes the bug, too. I wonder if there is a way that Git could guard cases where background gc processes may still be running? For completeness, here is the script I am using to trigger this on my machine. We didn't manage to reproduce the behavior on another machine, and I didn't find a more minimal example (sorry :( ). The script involves datalad (which uses git-annex): #!/bin/sh set -x # this creates a joint git/git-annex repository datalad create A && cd A # this adds adds and extracts a tarball with ~13.000 JPEGs to the repository. Data is added to git annex. datalad download-url \ --archive \ --message "Download Imagenette dataset" \ 'https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz' # this creates another joint git/git-annex repository cd ../ && datalad create B cd B ps aux -H | grep git git clone --progress ../A /tmp/B/subds ps aux -H | grep git Kind regards, Adina ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: suspected race between packing and fetch (single case study) 2021-01-08 16:39 ` suspected race between packing and fetch (single case study) Adina Wagner @ 2021-01-08 18:48 ` Taylor Blau 2021-01-09 22:11 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Taylor Blau @ 2021-01-08 18:48 UTC (permalink / raw) To: Adina Wagner; +Cc: git Hi Adina, On Fri, Jan 08, 2021 at 05:39:12PM +0100, Adina Wagner wrote: > Hi, > > > colleagues encouraged me to report a "personal" bug I've stumbled > across. Its "personal", because I wasn't able to create a minimal > reproducer, or even reproduce it with the same script on other > infrastructure. We're suspecting a race between packing and fetch. The > script I am using is at the bottom of the email. Indeed, similar races between fetching and repacking are known. For example, this discussion: https://lore.kernel.org/git/20200316082348.GA26581@inner.h.apk.li/ is about the .idx going away during a fetch. A similar thing is happening here, but instead of the .idx file going away, your source repository is repacking (and thus getting rid of loose object files). Here, I think the issue is less complicated. Since you're cloning from a local repository, the 'git clone' command calls 'clone_local()', which in turn calls 'copy_or_link_directory()'. If the directory being copied changes while being iterated over, the receiving end isn't guaranteed to pick up the changes. Worse, if the source _removes_ a file that hasn't yet been copied, over, then the copy will fail, which is what you're seeing here. One workaround would be to clone your repositories locally with '--shared', which won't copy any objects from the source repository, but instead mark its object store as an alternate to the newly created one. > I wonder if there is a way that Git could guard cases where background > gc processes may still be running? Perhaps Git could take some sort of lock when writing to the object store, but an flock wouldn't work since we'd want to allow multiple readers to acquire the lock simultaneously, so long as there is no writer. Thanks, Taylor ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: suspected race between packing and fetch (single case study) 2021-01-08 18:48 ` Taylor Blau @ 2021-01-09 22:11 ` Junio C Hamano 2021-01-11 19:25 ` Taylor Blau 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2021-01-09 22:11 UTC (permalink / raw) To: Taylor Blau; +Cc: Adina Wagner, git Taylor Blau <me@ttaylorr.com> writes: > Here, I think the issue is less complicated. Since you're cloning from a > local repository, the 'git clone' command calls 'clone_local()', which > in turn calls 'copy_or_link_directory()'. If the directory being copied > changes while being iterated over, the receiving end isn't guaranteed to > pick up the changes. > > Worse, if the source _removes_ a file that hasn't yet been copied, over, > then the copy will fail, which is what you're seeing here. And the source that removes a file during a repack would create a new file to keep the contents of the removed file available (if the object still matters after the repack), but because we do not retry our "cp -r" equivalent used in the clone_local(), we may not pick such a new file up. So, we probalby should document "git clone --local" that the user should expect fallout similar to what may happen when they copy a directory hierarchy with "cp -r src dst" and muck with what is in "src" while the copy is ongoing. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: suspected race between packing and fetch (single case study) 2021-01-09 22:11 ` Junio C Hamano @ 2021-01-11 19:25 ` Taylor Blau 2021-01-12 17:46 ` yoh 0 siblings, 1 reply; 7+ messages in thread From: Taylor Blau @ 2021-01-11 19:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Taylor Blau, Adina Wagner, git On Sat, Jan 09, 2021 at 02:11:55PM -0800, Junio C Hamano wrote: > So, we probalby should document "git clone --local" that the user > should expect fallout similar to what may happen when they copy a > directory hierarchy with "cp -r src dst" and muck with what is in > "src" while the copy is ongoing. Mm, good idea. Below the cut line is a patch to do just that. I thought briefly about documenting it in the pack-protocol page, but it only mentions the local transport in passing, so it seemed inappropriate to add that much more detail there. --- 8< --- Subject: [PATCH] Documentation/git-clone.txt: document race with --local When running 'git clone --local', the operation may fail if another process is modifying the source repository. Document that this race condition is known to hopefully help anyone who may run into it. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/git-clone.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt index 876aedcd47..02d9c19cec 100644 --- a/Documentation/git-clone.txt +++ b/Documentation/git-clone.txt @@ -57,6 +57,10 @@ repository is specified as a URL, then this flag is ignored (and we never use the local optimizations). Specifying `--no-local` will override the default when `/path/to/repo` is given, using the regular Git transport instead. ++ +*NOTE*: this operation can race with concurrent modification to the +source repository, similar to running `cp -r src dst` while modifying +`src`. --no-hardlinks:: Force the cloning process from a repository on a local -- 2.30.0.138.g6d7191ea01 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: suspected race between packing and fetch (single case study) 2021-01-11 19:25 ` Taylor Blau @ 2021-01-12 17:46 ` yoh 2021-01-12 18:47 ` Taylor Blau 0 siblings, 1 reply; 7+ messages in thread From: yoh @ 2021-01-12 17:46 UTC (permalink / raw) To: git On Mon, 11 Jan 2021, Taylor Blau wrote: > ++ > +*NOTE*: this operation can race with concurrent modification to the > +source repository, similar to running `cp -r src dst` while modifying > +`src`. Couldn't `gc` be triggered by git in seemingly read-only operations, thus possibly ruining the analogy with `cp` while doing `rm` (explicit intent to modify)? Moreover, situation is also a bit different since a sane user script would not place `rm` into background to keep operating on original source right before doing `cp` -- and that is what is happening here: `git` operation is presumably complete (but leaves `gc` running in the background) and script advances to the next step only to run into a race condition with that preceding `git` command which apparently triggered `gc`. Should then any script which operates on local `git` repositories not to forget to add -c gc.autodetach=0 for every git invocation which might be potentially effected? Cheers, -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 WWW: http://www.linkedin.com/in/yarik ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: suspected race between packing and fetch (single case study) 2021-01-12 17:46 ` yoh @ 2021-01-12 18:47 ` Taylor Blau 2021-01-13 14:55 ` yoh 0 siblings, 1 reply; 7+ messages in thread From: Taylor Blau @ 2021-01-12 18:47 UTC (permalink / raw) To: yoh; +Cc: git On Tue, Jan 12, 2021 at 12:46:22PM -0500, yoh@onerussian.com wrote: > > On Mon, 11 Jan 2021, Taylor Blau wrote: > > ++ > > +*NOTE*: this operation can race with concurrent modification to the > > +source repository, similar to running `cp -r src dst` while modifying > > +`src`. > > Couldn't `gc` be triggered by git in seemingly read-only operations, > thus possibly ruining the analogy with `cp` while doing `rm` (explicit > intent to modify)? > > Moreover, situation is also a bit different since a sane user script > would not place `rm` into background to keep operating on original > source right before doing `cp` -- and that is what is happening here: If you're suggesting that something is missing from the above patch, I'm not sure I quite understand what you would like added. All of these (background gc, explicit rm-ing) fall under the category of "concurrent modification": they are changing the source directory in some way while a read operation is taking place. > `git` operation is presumably complete (but leaves `gc` running in the > background) and script advances to the next step only to run into a race > condition with that preceding `git` command which apparently triggered > `gc`. Should then any script which operates on local `git` repositories > not to forget to add -c gc.autodetach=0 for every git > invocation which might be potentially effected? If your workflow is that you are frequently cloning via the local transport and there is no other synchronization going on between whatever work is happening in the source repository, then yes. (But note of course that you can set gc.autodetach=0 via the source repository's .git/config rather than typing it each time). Thanks, Taylor ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: suspected race between packing and fetch (single case study) 2021-01-12 18:47 ` Taylor Blau @ 2021-01-13 14:55 ` yoh 0 siblings, 0 replies; 7+ messages in thread From: yoh @ 2021-01-13 14:55 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Adina Wagner On Tue, 12 Jan 2021, Taylor Blau wrote: > > > ++ > > > +*NOTE*: this operation can race with concurrent modification to the > > > +source repository, similar to running `cp -r src dst` while modifying > > > +`src`. > > Couldn't `gc` be triggered by git in seemingly read-only operations, > > thus possibly ruining the analogy with `cp` while doing `rm` (explicit > > intent to modify)? > > Moreover, situation is also a bit different since a sane user script > > would not place `rm` into background to keep operating on original > > source right before doing `cp` -- and that is what is happening here: > If you're suggesting that something is missing from the above patch, I'm > not sure I quite understand what you would like added. Slept on it. I think your patch (doc disclaimer) is factually correct and probably as good as it can get. Not yet sure if it is worth explicit mentioning `gc` or `repack` as one of such concurrent operations. > All of these (background gc, explicit rm-ing) fall under the category of > "concurrent modification": they are changing the source directory in > some way while a read operation is taking place. yes. My comment was more on how such modifications are triggered: via explicit actions (e.g. `rm`) intended to modify vs as a "house keeping running in the background", which is the case of gc in particular when triggered by seemingly read-only operations. > > `git` operation is presumably complete (but leaves `gc` running in the > > background) and script advances to the next step only to run into a race > > condition with that preceding `git` command which apparently triggered > > `gc`. Should then any script which operates on local `git` repositories > > not to forget to add -c gc.autodetach=0 for every git > > invocation which might be potentially effected? > If your workflow is that you are frequently cloning via the local > transport and there is no other synchronization going on between > whatever work is happening in the source repository, then yes. (But note > of course that you can set gc.autodetach=0 via the source repository's > .git/config rather than typing it each time). IMHO it affects efficiency, become cumbersome (for git users), and thus might be error-prone: e.g. gc.autodetach=0 is necessity only to mitigate only for a possible subsequent `clone` invocation operating locally. Higher level constructs siting on top of `git` would not know what is the next command ran in the user script (like in our case of datalad) to set such config variable for their invocations. Adding gc.autodetach=0 to every single `git` invocation would effect our efficiency. User might not be made aware of such necessity for using `git clone` on local repositories, only after having their scripts deployed and at some random points in time start hitting the race condition and go "google" and RTFM mode to figure out what is going on. That is why I am more in-line with your initial comment in https://lore.kernel.org/git/X%2FipCPFyW3gAWrHo@nand.local/ : > Perhaps Git could take some sort of lock when writing to the object > store, but an flock wouldn't work since we'd want to allow multiple > readers to acquire the lock simultaneously, so long as there is no > writer. I think it would be nice to have `clone_local()` first check that there is no ongoing modifications happening before proceeding and wait some reasonable amount of time (up to ?0 sec?) if still ongoing, and then fail "informatively" if still cannot clone. Even though it would not prevent race condition in full (`clone_local` might check and initiate, and then some process starts altering while `clone_local` is ongoing), it would mitigate any scripted cases of a local `git clone` following some heavy manipulations of original repository which triggers background gc. -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 WWW: http://www.linkedin.com/in/yarik ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-01-13 14:58 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <fe9babc8-a3ee-6be4-e4f8-9690cb7c79bd@fz-juelich.de> 2021-01-08 16:39 ` suspected race between packing and fetch (single case study) Adina Wagner 2021-01-08 18:48 ` Taylor Blau 2021-01-09 22:11 ` Junio C Hamano 2021-01-11 19:25 ` Taylor Blau 2021-01-12 17:46 ` yoh 2021-01-12 18:47 ` Taylor Blau 2021-01-13 14:55 ` yoh
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).