git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: pack corruption post-mortem
Date: Wed, 16 Oct 2013 09:41:16 -0600	[thread overview]
Message-ID: <201310160941.16904.mfick@codeaurora.org> (raw)
In-Reply-To: <20131016083400.GA31266@sigill.intra.peff.net>

On Wednesday, October 16, 2013 02:34:01 am Jeff King wrote:
> I was recently presented with a repository with a
> corrupted packfile, and was asked if the data was
> recoverable. This post-mortem describes the steps I took
> to investigate and fix the problem. I thought others
> might find the process interesting, and it might help
> somebody in the same situation.

This is awesome Peff, thanks for the great writeup!

I have nightmares about this sort of thing every now and 
then, and we even experience some corruption here and there 
that needs to be fixed (mainly missing objects when we toy 
with different git repack arguments).  I cannot help but 
wonder, how we can improve git further to either help 
diagnose or even fix some of these problems?  More inline 
below...


> The first thing I did was pull the broken data out of the
> packfile. I needed to know how big the object was, which
> I found out with:
> 
>   $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep
> -A1 51653873 51653873
>   51664736
> 
> Show-index gives us the list of objects and their
> offsets. We throw away everything but the offsets, and
> then sort them so that our interesting offset (which we
> got from the fsck output above) is followed immediately
> by the offset of the next object. Now we know that the
> object data is 10863 bytes long, and we can grab it
> with:
> 
>   dd if=$pack of=object bs=1 skip=51653873 count=10863

Is there a current plumbing command that should be enhanced 
to be able to do the 2 steps above directly for people 
debugging (maybe with some new switch)?  If not, should we 
create one, git show --zlib, or git cat-file --zlib?


> Note that the "object" file isn't fit for feeding
> straight to zlib; it has the git packed object header,
> which is variable-length. We want to strip that off so
> we can start playing with the zlib data directly. You
> can either work your way through it manually (the format
> is described in
> Documentation/technical/pack-format.txt), or you can
> walk through it in a debugger. I did the latter,
> creating a valid pack like:
> 
>   # pack magic and version
>   printf 'PACK\0\0\0\2' >tmp.pack
>   # pack has one object
>   printf '\0\0\0\1' >>tmp.pack
>   # now add our object data
>   cat object >>tmp.pack
>   # and then append the pack trailer
>   /path/to/git.git/test-sha1 -b <tmp.pack >trailer
>   cat trailer >>tmp.pack
> 
> and then running "git index-pack tmp.pack" in the
> debugger (stop at unpack_raw_entry). Doing this, I found
> that there were 3 bytes of header (and the header itself
> had a sane type and size). So I stripped those off with:
> 
>   dd if=object of=zlib bs=1 skip=3

This too feels like something we should be able to do with a 
plumbing command eventually?

git zlib-extract

> So I took a different approach. Working under the guess
> that the corruption was limited to a single byte, I
> wrote a program to munge each byte individually, and try
> inflating the result. Since the object was only 10K
> compressed, that worked out to about 2.5M attempts,
> which took a few minutes.

Awesome!  Would this make a good new plumbing command, git 
zlib-fix?


> I fixed the packfile itself with:
> 
>   chmod +w $pack
>   printf '\xc7' | dd of=$pack bs=1 seek=51659518
> conv=notrunc chmod -w $pack
> 
> The '\xc7' comes from the replacement byte our "munge"
> program found. The offset 51659518 is derived by taking
> the original object offset (51653873), adding the
> replacement offset found by "munge" (5642), and then
> adding back in the 3 bytes of git header we stripped.

Another plumbing command needed?  git pack-put --zlib?

I am not saying my command suggestions are good, but maybe 
they will inspire the right answer?

-Martin

  parent reply	other threads:[~2013-10-16 15:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-16  8:34 Jeff King
2013-10-16  8:59 ` Duy Nguyen
2013-10-16 15:41 ` Martin Fick [this message]
2013-10-17  0:35   ` Jeff King
2013-10-17 15:47     ` Junio C Hamano
2013-10-25  7:55       ` Jeff King
2013-10-17  1:06   ` Duy Nguyen
2013-10-19 10:32 ` Duy Nguyen
2013-10-19 14:41   ` Nicolas Pitre
2013-10-19 19:17     ` Shawn Pearce
2013-10-20 20:56       ` Nicolas Pitre
2013-10-20  4:44     ` Duy Nguyen
2013-10-20 21:08       ` Nicolas Pitre
2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King
2015-04-01 22:21   ` Junio C Hamano
2015-04-02  0:49     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201310160941.16904.mfick@codeaurora.org \
    --to=mfick@codeaurora.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --subject='Re: pack corruption post-mortem' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).