git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Adina Wagner <a.wagner@fz-juelich.de>
To: <git@vger.kernel.org>
Subject: suspected race between packing and fetch (single case study)
Date: Fri, 8 Jan 2021 17:39:12 +0100	[thread overview]
Message-ID: <e7301aaf-b341-ec0b-9e2d-ab7f60ac58da@fz-juelich.de> (raw)
In-Reply-To: <fe9babc8-a3ee-6be4-e4f8-9690cb7c79bd@fz-juelich.de>

Hi,


colleagues encouraged me to report a "personal" bug I've stumbled
across. Its "personal", because I wasn't able to create a minimal
reproducer, or even reproduce it with the same script on other
infrastructure. We're suspecting a race between packing and fetch. The
script I am using is at the bottom of the email.

The script creates a joint Git/git-annex repository A with a large
number of objects. Afterwards, a repository B is created, and A is
cloned into it.
Cloning fails initially. Errors look like this:

+ git clone --progress ../A /tmp/B/subds
Cloning into '/tmp/B/subds'...
fatal: failed to copy file to
'/tmp/B/subds/.git/objects/44/93d6041a44b5a7280875ec9b6ecd78fbab7b6e':
No such file or directory

Running "ps aux -H | grep git" before and after cloning shows garbage
collection and packing processes in repo A. We're suspecting that there
is a race. Here is script output that shows the processes:
+ cd B
+ ps aux -H
+ grep git
adina     674763  0.0  0.0   6152   836 pts/5    S+   16:38
0:00           grep git
adina     674071  0.0  0.0   9584  2788 ?        Ss   16:38   0:00
/usr/lib/git-core/git gc --auto --no-quiet
adina     674072  0.0  0.0   9584  3884 ?        S    16:38   0:00
/usr/lib/git-core/git repack -d -l --no-write-bitmap-index
adina     674073  149  0.1 583780 20564 ?        R    16:38
0:02         /usr/lib/git-core/git pack-objects --local
--delta-base-offset .git/objects/pack/.tmp-674072-pack
--keep-true-parents --honor-pack-keep --non-empty --all --reflog
--indexed-objects --unpacked  --incremental
+ git clone --progress ../A /tmp/B/subds
Cloning into '/tmp/B/subds'...
fatal: failed to copy file to
'/tmp/B/subds/.git/objects/14/5a4c6775684788ecf51e5d745ac19ad5b204e3':
No such file or directory
+ ps aux -H
+ grep git
adina     674774  0.0  0.0   6152   896 pts/5    S+   16:38
0:00           grep git
adina     674071  0.0  0.0   9584  2788 ?        Ss   16:38   0:00
/usr/lib/git-core/git gc --auto --no-quiet
adina     674072 11.0  0.0  11160  3884 ?        R    16:38   0:00
/usr/lib/git-core/git repack -d -l --no-write-bitmap-index
bash script.sh  65.71s user 29.53s system 94% cpu 1:40.71 total



Both A and B are completely sane repositories, git fsck shows nothing
out of the ordinary, I can clone them fine in any situation but the
scripted workflow. If I add a short "sleep" between creating A and
cloning A into B the error vanishes.

I have been able to trigger this reliably for a month with the script. I
am running git version 2.29.2 (but also saw this when downgrading to
version 2.24) on Debian testing (bullseye). Other than simply waiting a
bit before the clone, setting git config --global gc.autodetach false
removes the bug, too.

I wonder if there is a way that Git could guard cases where background
gc processes may still be running?


For completeness, here is the script I am using to trigger this on my
machine. We didn't manage to reproduce the behavior on another machine,
and I didn't find a more minimal example (sorry :( ). The script
involves datalad (which uses git-annex):

#!/bin/sh

set -x

# this creates a joint git/git-annex repository
datalad create A && cd A
# this adds adds and extracts a tarball with ~13.000 JPEGs to the
repository. Data is added to git annex.
datalad download-url \
     --archive \
     --message "Download Imagenette dataset" \
     'https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz'
# this creates another joint git/git-annex repository
cd ../ && datalad create B
cd B
ps aux -H | grep git
git clone --progress ../A /tmp/B/subds
ps aux -H | grep git


Kind regards,
Adina



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------


       reply	other threads:[~2021-01-08 16:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fe9babc8-a3ee-6be4-e4f8-9690cb7c79bd@fz-juelich.de>
2021-01-08 16:39 ` Adina Wagner [this message]
2021-01-08 18:48   ` suspected race between packing and fetch (single case study) Taylor Blau
2021-01-09 22:11     ` Junio C Hamano
2021-01-11 19:25       ` Taylor Blau
2021-01-12 17:46         ` yoh
2021-01-12 18:47           ` Taylor Blau
2021-01-13 14:55             ` yoh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7301aaf-b341-ec0b-9e2d-ab7f60ac58da@fz-juelich.de \
    --to=a.wagner@fz-juelich.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).