From: Martin Fick <firstname.lastname@example.org> To: Jeff King <email@example.com> Cc: firstname.lastname@example.org Subject: Re: pack corruption post-mortem Date: Wed, 16 Oct 2013 09:41:16 -0600 [thread overview] Message-ID: <email@example.com> (raw) In-Reply-To: <20131016083400.GA31266@sigill.intra.peff.net> On Wednesday, October 16, 2013 02:34:01 am Jeff King wrote: > I was recently presented with a repository with a > corrupted packfile, and was asked if the data was > recoverable. This post-mortem describes the steps I took > to investigate and fix the problem. I thought others > might find the process interesting, and it might help > somebody in the same situation. This is awesome Peff, thanks for the great writeup! I have nightmares about this sort of thing every now and then, and we even experience some corruption here and there that needs to be fixed (mainly missing objects when we toy with different git repack arguments). I cannot help but wonder, how we can improve git further to either help diagnose or even fix some of these problems? More inline below... > The first thing I did was pull the broken data out of the > packfile. I needed to know how big the object was, which > I found out with: > > $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep > -A1 51653873 51653873 > 51664736 > > Show-index gives us the list of objects and their > offsets. We throw away everything but the offsets, and > then sort them so that our interesting offset (which we > got from the fsck output above) is followed immediately > by the offset of the next object. Now we know that the > object data is 10863 bytes long, and we can grab it > with: > > dd if=$pack of=object bs=1 skip=51653873 count=10863 Is there a current plumbing command that should be enhanced to be able to do the 2 steps above directly for people debugging (maybe with some new switch)? If not, should we create one, git show --zlib, or git cat-file --zlib? > Note that the "object" file isn't fit for feeding > straight to zlib; it has the git packed object header, > which is variable-length. We want to strip that off so > we can start playing with the zlib data directly. You > can either work your way through it manually (the format > is described in > Documentation/technical/pack-format.txt), or you can > walk through it in a debugger. I did the latter, > creating a valid pack like: > > # pack magic and version > printf 'PACK\0\0\0\2' >tmp.pack > # pack has one object > printf '\0\0\0\1' >>tmp.pack > # now add our object data > cat object >>tmp.pack > # and then append the pack trailer > /path/to/git.git/test-sha1 -b <tmp.pack >trailer > cat trailer >>tmp.pack > > and then running "git index-pack tmp.pack" in the > debugger (stop at unpack_raw_entry). Doing this, I found > that there were 3 bytes of header (and the header itself > had a sane type and size). So I stripped those off with: > > dd if=object of=zlib bs=1 skip=3 This too feels like something we should be able to do with a plumbing command eventually? git zlib-extract > So I took a different approach. Working under the guess > that the corruption was limited to a single byte, I > wrote a program to munge each byte individually, and try > inflating the result. Since the object was only 10K > compressed, that worked out to about 2.5M attempts, > which took a few minutes. Awesome! Would this make a good new plumbing command, git zlib-fix? > I fixed the packfile itself with: > > chmod +w $pack > printf '\xc7' | dd of=$pack bs=1 seek=51659518 > conv=notrunc chmod -w $pack > > The '\xc7' comes from the replacement byte our "munge" > program found. The offset 51659518 is derived by taking > the original object offset (51653873), adding the > replacement offset found by "munge" (5642), and then > adding back in the 3 bytes of git header we stripped. Another plumbing command needed? git pack-put --zlib? I am not saying my command suggestions are good, but maybe they will inspire the right answer? -Martin
next prev parent reply other threads:[~2013-10-16 15:41 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-10-16 8:34 Jeff King 2013-10-16 8:59 ` Duy Nguyen 2013-10-16 15:41 ` Martin Fick [this message] 2013-10-17 0:35 ` Jeff King 2013-10-17 15:47 ` Junio C Hamano 2013-10-25 7:55 ` Jeff King 2013-10-17 1:06 ` Duy Nguyen 2013-10-19 10:32 ` Duy Nguyen 2013-10-19 14:41 ` Nicolas Pitre 2013-10-19 19:17 ` Shawn Pearce 2013-10-20 20:56 ` Nicolas Pitre 2013-10-20 4:44 ` Duy Nguyen 2013-10-20 21:08 ` Nicolas Pitre 2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King 2015-04-01 22:21 ` Junio C Hamano 2015-04-02 0:49 ` Jeff King
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: pack corruption post-mortem' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).