From: Martin Fick <mfick@codeaurora.org>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: pack corruption post-mortem
Date: Wed, 16 Oct 2013 09:41:16 -0600 [thread overview]
Message-ID: <201310160941.16904.mfick@codeaurora.org> (raw)
In-Reply-To: <20131016083400.GA31266@sigill.intra.peff.net>
On Wednesday, October 16, 2013 02:34:01 am Jeff King wrote:
> I was recently presented with a repository with a
> corrupted packfile, and was asked if the data was
> recoverable. This post-mortem describes the steps I took
> to investigate and fix the problem. I thought others
> might find the process interesting, and it might help
> somebody in the same situation.
This is awesome Peff, thanks for the great writeup!
I have nightmares about this sort of thing every now and
then, and we even experience some corruption here and there
that needs to be fixed (mainly missing objects when we toy
with different git repack arguments). I cannot help but
wonder, how we can improve git further to either help
diagnose or even fix some of these problems? More inline
below...
> The first thing I did was pull the broken data out of the
> packfile. I needed to know how big the object was, which
> I found out with:
>
> $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep
> -A1 51653873 51653873
> 51664736
>
> Show-index gives us the list of objects and their
> offsets. We throw away everything but the offsets, and
> then sort them so that our interesting offset (which we
> got from the fsck output above) is followed immediately
> by the offset of the next object. Now we know that the
> object data is 10863 bytes long, and we can grab it
> with:
>
> dd if=$pack of=object bs=1 skip=51653873 count=10863
Is there a current plumbing command that should be enhanced
to be able to do the 2 steps above directly for people
debugging (maybe with some new switch)? If not, should we
create one, git show --zlib, or git cat-file --zlib?
> Note that the "object" file isn't fit for feeding
> straight to zlib; it has the git packed object header,
> which is variable-length. We want to strip that off so
> we can start playing with the zlib data directly. You
> can either work your way through it manually (the format
> is described in
> Documentation/technical/pack-format.txt), or you can
> walk through it in a debugger. I did the latter,
> creating a valid pack like:
>
> # pack magic and version
> printf 'PACK\0\0\0\2' >tmp.pack
> # pack has one object
> printf '\0\0\0\1' >>tmp.pack
> # now add our object data
> cat object >>tmp.pack
> # and then append the pack trailer
> /path/to/git.git/test-sha1 -b <tmp.pack >trailer
> cat trailer >>tmp.pack
>
> and then running "git index-pack tmp.pack" in the
> debugger (stop at unpack_raw_entry). Doing this, I found
> that there were 3 bytes of header (and the header itself
> had a sane type and size). So I stripped those off with:
>
> dd if=object of=zlib bs=1 skip=3
This too feels like something we should be able to do with a
plumbing command eventually?
git zlib-extract
> So I took a different approach. Working under the guess
> that the corruption was limited to a single byte, I
> wrote a program to munge each byte individually, and try
> inflating the result. Since the object was only 10K
> compressed, that worked out to about 2.5M attempts,
> which took a few minutes.
Awesome! Would this make a good new plumbing command, git
zlib-fix?
> I fixed the packfile itself with:
>
> chmod +w $pack
> printf '\xc7' | dd of=$pack bs=1 seek=51659518
> conv=notrunc chmod -w $pack
>
> The '\xc7' comes from the replacement byte our "munge"
> program found. The offset 51659518 is derived by taking
> the original object offset (51653873), adding the
> replacement offset found by "munge" (5642), and then
> adding back in the 3 bytes of git header we stripped.
Another plumbing command needed? git pack-put --zlib?
I am not saying my command suggestions are good, but maybe
they will inspire the right answer?
-Martin
next prev parent reply other threads:[~2013-10-16 15:41 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-16 8:34 pack corruption post-mortem Jeff King
2013-10-16 8:59 ` Duy Nguyen
2013-10-16 15:41 ` Martin Fick [this message]
2013-10-17 0:35 ` Jeff King
2013-10-17 15:47 ` Junio C Hamano
2013-10-25 7:55 ` Jeff King
2013-10-17 1:06 ` Duy Nguyen
2013-10-19 10:32 ` Duy Nguyen
2013-10-19 14:41 ` Nicolas Pitre
2013-10-19 19:17 ` Shawn Pearce
2013-10-20 20:56 ` Nicolas Pitre
2013-10-20 4:44 ` Duy Nguyen
2013-10-20 21:08 ` Nicolas Pitre
2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King
2015-04-01 22:21 ` Junio C Hamano
2015-04-02 0:49 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201310160941.16904.mfick@codeaurora.org \
--to=mfick@codeaurora.org \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).