git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: propagating repo corruption across clone
Date: Sun, 24 Mar 2013 14:31:33 -0400	[thread overview]
Message-ID: <20130324183133.GA11200@sigill.intra.peff.net> (raw)

I saw this post-mortem on recent disk corruption seen on git.kde.org:

  http://jefferai.org/2013/03/24/too-perfect-a-mirror/

The interesting bit to me is that object corruption propagated across a
clone (and oddly, that --mirror made complaints about corruption go
away). I did a little testing and found some curious results (this ended
up long; skip to the bottom for my conclusions).

Here's a fairly straight-forward corruption recipe:

-- >8 --
obj_to_file() {
  echo ".git/objects/$(echo $1 | sed 's,..,&/,')"
}

# corrupt a single byte inside the object
corrupt_object() {
  fn=$(obj_to_file "$1") &&
  chmod +w "$fn" &&
  printf '\0' | dd of="$fn" bs=1 conv=notrunc seek=10
}

git init repo &&
cd repo &&
echo content >file &&
git add file &&
git commit -m one &&
corrupt_object $(git rev-parse HEAD:file)
-- 8< --

report git clone . fast-local
report git clone --no-local . no-local
report git -c transfer.unpackLimit=1 clone --no-local . index-pack
report git -c fetch.fsckObjects=1 clone --no-local . fsck

and here is how clone reacts in a few situations:

  $ git clone --bare . local-bare && echo WORKED
  Cloning into bare repository 'local-bare'...
  done.
  WORKED

We don't notice the problem during the transport phase, which is to be
expected; we're using the fast "just hardlink it" code path. So that's
OK.

  $ git clone . local-tree && echo WORKED
  Cloning into 'local-tree'...
  done.
  error: inflate: data stream error (invalid distance too far back)
  error: unable to unpack d95f3ad14dee633a758d2e331151e950dd13e4ed header
  WORKED

We _do_ see a problem during the checkout phase, but we don't propagate
a checkout failure to the exit code from clone.  That is bad in general,
and should probably be fixed. Though it would never find corruption of
older objects in the history, anyway, so checkout should not be relied
on for robustness.

  $ git clone --no-local . non-local && echo WORKED
  Cloning into 'non-local'...
  remote: Counting objects: 3, done.
  remote: error: inflate: data stream error (invalid distance too far back)
  remote: error: unable to unpack d95f3ad14dee633a758d2e331151e950dd13e4ed header
  remote: error: inflate: data stream error (invalid distance too far back)
  remote: fatal: loose object d95f3ad14dee633a758d2e331151e950dd13e4ed (stored in ./objects/d9/5f3ad14dee633a758d2e331151e950dd13e4ed) is corrupt
  error: git upload-pack: git-pack-objects died with error.
  fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
  remote: aborting due to possible repository corruption on the remote side.
  fatal: early EOF
  fatal: index-pack failed

Here we detect the error. It's noticed by pack-objects on the remote
side as it tries to put the bogus object into a pack. But what if we
already have a pack that's been corrupted, and pack-objects is just
pushing out entries without doing any recompression?

Let's change our corrupt_object to:

  corrupt_object() {
    git repack -ad &&
    pack=`echo .git/objects/pack/*.pack` &&
    chmod +w "$pack" &&
    printf '\0' | dd of="$pack" bs=1 conv=notrunc seek=175
  }

and try again:

  $ git clone --no-local . non-local && echo WORKED
  Cloning into 'non-local'...
  remote: Counting objects: 3, done.
  remote: Total 3 (delta 0), reused 3 (delta 0)
  error: inflate: data stream error (invalid distance too far back)
  fatal: pack has bad object at offset 169: inflate returned -3
  fatal: index-pack failed

Great, we still notice the problem in unpack-objects on the receiving
end. But what if there's a more subtle corruption, where filesystem
corruption points the directory entry for one object at the inode of
another. Like:

  corrupt_object() {
    corrupt=$(echo corrupted | git hash-object -w --stdin) &&
    mv -f $(obj_to_file $corrupt) $(obj_to_file $1)
  }

This is going to be more subtle, because the object in the packfile is
self-consistent but the object graph as a whole is broken.

  $ git clone --no-local . non-local && echo WORKED
  Cloning into 'non-local'...
  remote: Counting objects: 3, done.
  remote: Total 3 (delta 0), reused 0 (delta 0)
  Receiving objects: 100% (3/3), done.
  error: unable to find d95f3ad14dee633a758d2e331151e950dd13e4ed
  WORKED

Like the --local cases earlier, we notice the missing object during the
checkout phase, but do not correctly propagate the error.

We do not notice the sha1 mis-match on the sending side (which we could,
if we checked the sha1 as we were sending). We do not notice the broken
object graph during the receive process either. I would have expected
check_everything_connected to handle this, but we don't actually call it
during clone! If you do this:

  $ git init non-local && cd non-local && git fetch ..
  remote: Counting objects: 3, done.
  remote: Total 3 (delta 0), reused 3 (delta 0)
  Unpacking objects: 100% (3/3), done.
  fatal: missing blob object 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
  error: .. did not send all necessary objects

we do notice.

And one final check:

  $ git -c transfer.fsckobjects=1 clone --no-local . fsck
  Cloning into 'fsck'...
  remote: Counting objects: 3, done.
  remote: Total 3 (delta 0), reused 3 (delta 0)
  Receiving objects: 100% (3/3), done.
  error: unable to find d95f3ad14dee633a758d2e331151e950dd13e4ed
  fatal: object of unexpected type
  fatal: index-pack failed

Fscking the incoming objects does work, but of course it comes at a cost
in the normal case (for linux-2.6, I measured an increase in CPU time
with "index-pack --strict" from ~2.5 minutes to ~4 minutes). And I think
it is probably overkill for finding corruption; index-pack already
recognizes bit corruption inside an object, and
check_everything_connected can detect object graph problems much more
cheaply.

One thing I didn't check is bit corruption inside a packed object that
still correctly zlib inflates. check_everything_connected will end up
reading all of the commits and trees (to walk them), but not the blobs.
And I don't think that we explicitly re-sha1 every incoming object (only
if we detect a possible collision). So it may be that
transfer.fsckObjects would save us there (it also introduces new
problems if there are ignorable warnings in the objects you receive,
like zero-padded trees).

So I think at the very least we should:

  1. Make sure clone propagates errors from checkout to the final exit
     code.

  2. Teach clone to run check_everything_connected.

I don't have details on the KDE corruption, or why it wasn't detected
(if it was one of the cases I mentioned above, or a more subtle issue).

-Peff

             reply	other threads:[~2013-03-24 18:32 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-24 18:31 Jeff King [this message]
2013-03-24 19:01 ` propagating repo corruption across clone Ævar Arnfjörð Bjarmason
2013-03-24 19:23   ` Jeff King
2013-03-25 13:43     ` Jeff Mitchell
2013-03-25 14:56       ` Jeff King
2013-03-25 15:31         ` Duy Nguyen
2013-03-25 15:56           ` Jeff King
2013-03-25 16:32             ` Jeff Mitchell
2013-03-25 20:07               ` Jeff King
2013-03-26 13:43                 ` Jeff Mitchell
2013-03-26 16:55                   ` Jeff King
2013-03-26 21:59                     ` Philip Oakley
2013-03-26 22:03                       ` Jeff King
2013-03-26 23:20                     ` Rich Fromm
2013-03-27  1:25                       ` Jonathan Nieder
2013-03-27 18:23                         ` Rich Fromm
2013-03-27 19:49                           ` Jeff King
2013-03-27 20:04                             ` Jeff King
2013-03-27  3:47                       ` Junio C Hamano
2013-03-27  6:19                         ` Sitaram Chamarty
2013-03-27 15:03                           ` Junio C Hamano
2013-03-27 15:47                             ` Sitaram Chamarty
2013-03-27 18:51                         ` Rich Fromm
2013-03-27 19:13                           ` Junio C Hamano
2013-03-28 13:52                           ` Jeff Mitchell
2013-03-28 13:48                         ` Jeff Mitchell
2013-03-26  1:06             ` Duy Nguyen
2013-03-24 19:16 ` Ilari Liusvaara
2013-03-25 20:01 ` Junio C Hamano
2013-03-25 20:05   ` Jeff King
2013-03-25 20:14 ` [PATCH 0/9] corrupt object potpourri Jeff King
2013-03-25 20:16   ` [PATCH 1/9] stream_blob_to_fd: detect errors reading from stream Jeff King
2013-03-26 21:27     ` Junio C Hamano
2013-03-25 20:17   ` [PATCH 2/9] check_sha1_signature: check return value from read_istream Jeff King
2013-03-25 20:18   ` [PATCH 3/9] read_istream_filtered: propagate read error from upstream Jeff King
2013-03-25 20:21   ` [PATCH 4/9] avoid infinite loop in read_istream_loose Jeff King
2013-03-25 20:21   ` [PATCH 5/9] add test for streaming corrupt blobs Jeff King
2013-03-25 21:10     ` Jonathan Nieder
2013-03-25 21:26       ` Jeff King
2013-03-27 20:27     ` Jeff King
2013-03-27 20:35       ` Junio C Hamano
2013-03-25 20:22   ` [PATCH 6/9] streaming_write_entry: propagate streaming errors Jeff King
2013-03-25 21:35     ` Eric Sunshine
2013-03-25 21:37       ` Jeff King
2013-03-25 21:39     ` Jonathan Nieder
2013-03-25 21:49       ` [PATCH v2 " Jeff King
2013-03-25 23:29         ` Jonathan Nieder
2013-03-26 21:38         ` Junio C Hamano
2013-03-25 20:22   ` [PATCH 7/9] add tests for cloning corrupted repositories Jeff King
2013-03-25 20:23   ` [PATCH 8/9] clone: die on errors from unpack_trees Jeff King
2013-03-26 21:40     ` Junio C Hamano
2013-03-26 22:22       ` [PATCH 10/9] clone: leave repo in place after checkout errors Jeff King
2013-03-26 22:32         ` Jonathan Nieder
2013-03-27  1:03           ` Jeff King
2013-03-25 20:26   ` [PATCH 9/9] clone: run check_everything_connected Jeff King
2013-03-26  0:53     ` Duy Nguyen
2013-03-26 22:24       ` Jeff King
2013-03-26 21:50     ` Junio C Hamano
2013-03-28  0:40     ` Duy Nguyen
2013-03-31  7:57       ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130324183133.GA11200@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).