* pack corruption post-mortem @ 2013-10-16 8:34 Jeff King 2013-10-16 8:59 ` Duy Nguyen ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Jeff King @ 2013-10-16 8:34 UTC (permalink / raw) To: git I was recently presented with a repository with a corrupted packfile, and was asked if the data was recoverable. This post-mortem describes the steps I took to investigate and fix the problem. I thought others might find the process interesting, and it might help somebody in the same situation. I started with an fsck, which found a problem with exactly one object (I've used $pack and $obj below to keep the output readable, and also because I'll refer to them later): $ git fsck error: $pack SHA1 checksum mismatch error: index CRC mismatch for object $obj from $pack at offset 51653873 error: inflate: data stream error (incorrect data check) error: cannot unpack $obj from $pack at offset 51653873 The pack checksum failing means a byte is munged somewhere, and it is presumably in the object mentioned (since both the index checksum and zlib were failing). Reading the zlib source code, I found that "incorrect data check" means that the adler-32 checksum at the end of the zlib data did not match the inflated data. So stepping the data through zlib would not help, as it did not fail until the very end, when we realize the crc does not match. The problematic bytes could be anywhere in the object data. The first thing I did was pull the broken data out of the packfile. I needed to know how big the object was, which I found out with: $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep -A1 51653873 51653873 51664736 Show-index gives us the list of objects and their offsets. We throw away everything but the offsets, and then sort them so that our interesting offset (which we got from the fsck output above) is followed immediately by the offset of the next object. Now we know that the object data is 10863 bytes long, and we can grab it with: dd if=$pack of=object bs=1 skip=51653873 count=10863 I inspected a hexdump of the data, looking for any obvious bogosity (e.g., a 4K run of zeroes would be a good sign of filesystem corruption). But everything looked pretty reasonable. Note that the "object" file isn't fit for feeding straight to zlib; it has the git packed object header, which is variable-length. We want to strip that off so we can start playing with the zlib data directly. You can either work your way through it manually (the format is described in Documentation/technical/pack-format.txt), or you can walk through it in a debugger. I did the latter, creating a valid pack like: # pack magic and version printf 'PACK\0\0\0\2' >tmp.pack # pack has one object printf '\0\0\0\1' >>tmp.pack # now add our object data cat object >>tmp.pack # and then append the pack trailer /path/to/git.git/test-sha1 -b <tmp.pack >trailer cat trailer >>tmp.pack and then running "git index-pack tmp.pack" in the debugger (stop at unpack_raw_entry). Doing this, I found that there were 3 bytes of header (and the header itself had a sane type and size). So I stripped those off with: dd if=object of=zlib bs=1 skip=3 I ran the result through zlib's inflate using a custom C program. And while it did report the error, I did get the right number of output bytes (i.e., it matched git's size header that we decoded above). But feeding the result back to "git hash-object" didn't produce the same sha1. So there were some wrong bytes, but I didn't know which. The file happened to be C source code, so I hoped I could notice something obviously wrong with it, but I didn't. I even got it to compile! I also tried comparing it to other versions of the same path in the repository, hoping that there would be some part of the diff that didn't make sense. Unfortunately, this happened to be the only revision of this particular file in the repository, so I had nothing to compare against. So I took a different approach. Working under the guess that the corruption was limited to a single byte, I wrote a program to munge each byte individually, and try inflating the result. Since the object was only 10K compressed, that worked out to about 2.5M attempts, which took a few minutes. The program I used is here: -- >8 -- #include <stdio.h> #include <unistd.h> #include <string.h> #include <signal.h> #include <zlib.h> static int try_zlib(unsigned char *buf, int len) { /* make this absurdly large so we don't have to loop */ static unsigned char out[1024*1024]; z_stream z; int ret; memset(&z, 0, sizeof(z)); inflateInit(&z); z.next_in = buf; z.avail_in = len; z.next_out = out; z.avail_out = sizeof(out); ret = inflate(&z, 0); inflateEnd(&z); return ret >= 0; } /* eye candy */ static int counter = 0; static void progress(int sig) { fprintf(stderr, "\r%d", counter); alarm(1); } int main(void) { /* oversized so we can read the whole buffer in */ unsigned char buf[1024*1024]; int len; unsigned i, j; signal(SIGALRM, progress); alarm(1); len = read(0, buf, sizeof(buf)); for (i = 0; i < len; i++) { unsigned char c = buf[i]; for (j = 0; j <= 0xff; j++) { buf[i] = j; counter++; if (try_zlib(buf, len)) printf("i=%d, j=%x\n", i, j); } buf[i] = c; } alarm(0); fprintf(stderr, "\n"); return 0; } -- >8 -- I compiled and ran with: gcc -Wall -Werror -O3 munge.c -o munge -lz ./munge <zlib There were a few false positives early on (if you write "no data" in the zlib header, zlib thinks it's just fine :) ). But I got a hit about halfway through: i=5642, j=c7 I let it run to completion, and got a few more hits at the end (where it was munging the crc to match our broken data). So there was a good chance this middle hit was the source of the problem. I confirmed by tweaking the byte in a hex editor, zlib inflating the result (no errors!), and then piping the output into "git hash-object", which reported the sha1 of the broken object. Success! I fixed the packfile itself with: chmod +w $pack printf '\xc7' | dd of=$pack bs=1 seek=51659518 conv=notrunc chmod -w $pack The '\xc7' comes from the replacement byte our "munge" program found. The offset 51659518 is derived by taking the original object offset (51653873), adding the replacement offset found by "munge" (5642), and then adding back in the 3 bytes of git header we stripped. After that, "git fsck" ran clean. As for the corruption itself, I was lucky that it was indeed a single byte. In fact, it turned out to be a single bit. The byte 0xc7 was corrupted to 0xc5. So presumably it was caused by faulty hardware, or a cosmic ray. And the aborted attempt to look at the inflated output to see what was wrong? I could have looked forever and never found it. Here's the diff between what the corrupted data inflates to, versus the real data: - cp = strtok (arg, "+"); + cp = strtok (arg, "."); It tweaked one byte and still ended up as valid, readable C that just happened to do something totally different! One takeaway is that on a less unlucky day, looking at the zlib output might have actually been helpful, as most random changes would actually break the C code. But more importantly, git's hashing and checksumming noticed a problem that easily could have gone undetected in another system. The result still compiled, but would have caused an interesting bug (that would have been blamed on some random commit). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-16 8:34 pack corruption post-mortem Jeff King @ 2013-10-16 8:59 ` Duy Nguyen 2013-10-16 15:41 ` Martin Fick ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: Duy Nguyen @ 2013-10-16 8:59 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote: > I was recently presented with a repository with a corrupted packfile, > and was asked if the data was recoverable. This post-mortem describes > the steps I took to investigate and fix the problem. I thought others > might find the process interesting, and it might help somebody in the > same situation. It's like reading an LWN article. Thank you. -- Duy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-16 8:34 pack corruption post-mortem Jeff King 2013-10-16 8:59 ` Duy Nguyen @ 2013-10-16 15:41 ` Martin Fick 2013-10-17 0:35 ` Jeff King 2013-10-17 1:06 ` Duy Nguyen 2013-10-19 10:32 ` Duy Nguyen 2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King 3 siblings, 2 replies; 16+ messages in thread From: Martin Fick @ 2013-10-16 15:41 UTC (permalink / raw) To: Jeff King; +Cc: git On Wednesday, October 16, 2013 02:34:01 am Jeff King wrote: > I was recently presented with a repository with a > corrupted packfile, and was asked if the data was > recoverable. This post-mortem describes the steps I took > to investigate and fix the problem. I thought others > might find the process interesting, and it might help > somebody in the same situation. This is awesome Peff, thanks for the great writeup! I have nightmares about this sort of thing every now and then, and we even experience some corruption here and there that needs to be fixed (mainly missing objects when we toy with different git repack arguments). I cannot help but wonder, how we can improve git further to either help diagnose or even fix some of these problems? More inline below... > The first thing I did was pull the broken data out of the > packfile. I needed to know how big the object was, which > I found out with: > > $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep > -A1 51653873 51653873 > 51664736 > > Show-index gives us the list of objects and their > offsets. We throw away everything but the offsets, and > then sort them so that our interesting offset (which we > got from the fsck output above) is followed immediately > by the offset of the next object. Now we know that the > object data is 10863 bytes long, and we can grab it > with: > > dd if=$pack of=object bs=1 skip=51653873 count=10863 Is there a current plumbing command that should be enhanced to be able to do the 2 steps above directly for people debugging (maybe with some new switch)? If not, should we create one, git show --zlib, or git cat-file --zlib? > Note that the "object" file isn't fit for feeding > straight to zlib; it has the git packed object header, > which is variable-length. We want to strip that off so > we can start playing with the zlib data directly. You > can either work your way through it manually (the format > is described in > Documentation/technical/pack-format.txt), or you can > walk through it in a debugger. I did the latter, > creating a valid pack like: > > # pack magic and version > printf 'PACK\0\0\0\2' >tmp.pack > # pack has one object > printf '\0\0\0\1' >>tmp.pack > # now add our object data > cat object >>tmp.pack > # and then append the pack trailer > /path/to/git.git/test-sha1 -b <tmp.pack >trailer > cat trailer >>tmp.pack > > and then running "git index-pack tmp.pack" in the > debugger (stop at unpack_raw_entry). Doing this, I found > that there were 3 bytes of header (and the header itself > had a sane type and size). So I stripped those off with: > > dd if=object of=zlib bs=1 skip=3 This too feels like something we should be able to do with a plumbing command eventually? git zlib-extract > So I took a different approach. Working under the guess > that the corruption was limited to a single byte, I > wrote a program to munge each byte individually, and try > inflating the result. Since the object was only 10K > compressed, that worked out to about 2.5M attempts, > which took a few minutes. Awesome! Would this make a good new plumbing command, git zlib-fix? > I fixed the packfile itself with: > > chmod +w $pack > printf '\xc7' | dd of=$pack bs=1 seek=51659518 > conv=notrunc chmod -w $pack > > The '\xc7' comes from the replacement byte our "munge" > program found. The offset 51659518 is derived by taking > the original object offset (51653873), adding the > replacement offset found by "munge" (5642), and then > adding back in the 3 bytes of git header we stripped. Another plumbing command needed? git pack-put --zlib? I am not saying my command suggestions are good, but maybe they will inspire the right answer? -Martin ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-16 15:41 ` Martin Fick @ 2013-10-17 0:35 ` Jeff King 2013-10-17 15:47 ` Junio C Hamano 2013-10-17 1:06 ` Duy Nguyen 1 sibling, 1 reply; 16+ messages in thread From: Jeff King @ 2013-10-17 0:35 UTC (permalink / raw) To: Martin Fick; +Cc: git On Wed, Oct 16, 2013 at 09:41:16AM -0600, Martin Fick wrote: > I have nightmares about this sort of thing every now and > then, and we even experience some corruption here and there > that needs to be fixed (mainly missing objects when we toy > with different git repack arguments). I cannot help but > wonder, how we can improve git further to either help > diagnose or even fix some of these problems? More inline > below... In general, I don't think we know enough about patterns of recovery corruption to say which commands would definitely be worth implementing. Part of the reason I wrote this up is to document this one case. But this is the first time in 7 years of git usage that I've had to do this. So I'd feel a little bit better about sinking time into it after seeing a few more cases and realizing where the patterns are. One of the major hassles is that the assumptions you can and can't make depend on what data you have that _isn't_ corrupted. Do you have a pack index, or a bare pack? Do you have zlib data that fails the crc, or zlib data that cannot be parsed? In this case there was no other copy of the repository. But if you know the broken object (which we did here), and you can copy it from elsewhere, then git already will try to find other sources of the object (loose, or in another pack). > > dd if=$pack of=object bs=1 skip=51653873 count=10863 > > Is there a current plumbing command that should be enhanced > to be able to do the 2 steps above directly for people > debugging (maybe with some new switch)? If not, should we > create one, git show --zlib, or git cat-file --zlib? Most of the git plumbing commands deal with data at the object layer. This is really about going a step below and saying "Give me the on-disk representation of the object". We recently introduced an "%(objectsize:disk)" formatter for cat-file. The logical extension would be to ask for "%(contents:disk)" or something. Though what you get would depend on how the object is stored, so you would need to figure that out to do anything useful with it. Note that this implies you actually have a packfile index that says "object XXX is at offset YYY". In some corruption cases, you might have only a packfile. That is generally enough to generate the index, but if there is corruption, you cannot actually parse the pack to find out the sha1 of the objects. So in the worst case, what you really want is something like "dump the object in packfile X at offset Y". But even then, you don't know the length of the object. The packfile is a stream, and the length we calculated is from the index, which depends on the zlib data parsing in some sane way. > > and then running "git index-pack tmp.pack" in the > > debugger (stop at unpack_raw_entry). Doing this, I found > > that there were 3 bytes of header (and the header itself > > had a sane type and size). So I stripped those off with: > > > > dd if=object of=zlib bs=1 skip=3 > > This too feels like something we should be able to do with a > plumbing command eventually? > > git zlib-extract Perhaps. I think if you had some "extract object at offset X from the packfile" command, it would be optional to give you the whole thing, or just the zlib data. > > So I took a different approach. Working under the guess > > that the corruption was limited to a single byte, I > > wrote a program to munge each byte individually, and try > > inflating the result. Since the object was only 10K > > compressed, that worked out to about 2.5M attempts, > > which took a few minutes. > > Awesome! Would this make a good new plumbing command, git > zlib-fix? I'd like to see it actually work more than once first. This relies on there being single-byte corruption. Even double-byte corruption starts to get expensive to brute-force like this. SHA1, by its nature, requires brute-forcing. But it's possible that the crc, not being cryptographically secure, could be reverse-engineered to find likely spots of corruption. I don't know enough about it to say. > > I fixed the packfile itself with: > > > > chmod +w $pack > > printf '\xc7' | dd of=$pack bs=1 seek=51659518 > > conv=notrunc chmod -w $pack > > > > The '\xc7' comes from the replacement byte our "munge" > > program found. The offset 51659518 is derived by taking > > the original object offset (51653873), adding the > > replacement offset found by "munge" (5642), and then > > adding back in the 3 bytes of git header we stripped. > > Another plumbing command needed? git pack-put --zlib? I think in this case that dd does a nice job of solving the problem. Some of the stuff I did was very git-specific and required knowledge of the formats. But this one is really just "replace byte X at offset Y", and I don't see any need to avoid a general-purpose tool (except that dd is itself reasonably arcane :) ). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-17 0:35 ` Jeff King @ 2013-10-17 15:47 ` Junio C Hamano 2013-10-25 7:55 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2013-10-17 15:47 UTC (permalink / raw) To: Jeff King; +Cc: Martin Fick, git Jeff King <peff@peff.net> writes: > On Wed, Oct 16, 2013 at 09:41:16AM -0600, Martin Fick wrote: > >> I have nightmares about this sort of thing every now and >> then, and we even experience some corruption here and there >> that needs to be fixed (mainly missing objects when we toy >> with different git repack arguments). I cannot help but >> wonder, how we can improve git further to either help >> diagnose or even fix some of these problems? More inline >> below... > > In general, I don't think we know enough about patterns of recovery > corruption to say which commands would definitely be worth implementing. > Part of the reason I wrote this up is to document this one case. But > this is the first time in 7 years of git usage that I've had to do this. > So I'd feel a little bit better about sinking time into it after seeing > a few more cases and realizing where the patterns are. There was one area in our Documentation/ set we used to use to keep this kind of message almost as-is; perhaps this message fits there? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-17 15:47 ` Junio C Hamano @ 2013-10-25 7:55 ` Jeff King 0 siblings, 0 replies; 16+ messages in thread From: Jeff King @ 2013-10-25 7:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: Martin Fick, git On Thu, Oct 17, 2013 at 08:47:05AM -0700, Junio C Hamano wrote: > > In general, I don't think we know enough about patterns of recovery > > corruption to say which commands would definitely be worth implementing. > > Part of the reason I wrote this up is to document this one case. But > > this is the first time in 7 years of git usage that I've had to do this. > > So I'd feel a little bit better about sinking time into it after seeing > > a few more cases and realizing where the patterns are. > > There was one area in our Documentation/ set we used to use to keep > this kind of message almost as-is; perhaps this message fits there? I've wondered if that howto section has much value, as they are already available in the list archive, and often show their age after a while. Still, I suppose it is a sort of curated list of interesting posts. Here's my article, all gussied up for the howto directory. Take it or leave it. -- >8 -- Subject: [PATCH] howto: add article on recovering a corrupted object This is an asciidoc-ified version of a corruption post-mortem sent to the git list. It complements the existing howto article, since it covers a case where the object couldn't be easily recreated or copied from elsewhere. Signed-off-by: Jeff King <peff@peff.net> --- Documentation/Makefile | 1 + .../howto/recover-corrupted-object-harder.txt | 242 +++++++++++++++++++++ 2 files changed, 243 insertions(+) create mode 100644 Documentation/howto/recover-corrupted-object-harder.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index 4f13a23..91a12c7 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -53,6 +53,7 @@ SP_ARTICLES += howto/setup-git-server-over-http SP_ARTICLES += howto/separating-topic-branches SP_ARTICLES += howto/revert-a-faulty-merge SP_ARTICLES += howto/recover-corrupted-blob-object +SP_ARTICLES += howto/recover-corrupted-object-harder SP_ARTICLES += howto/rebuild-from-update-hook SP_ARTICLES += howto/rebase-from-internal-branch SP_ARTICLES += howto/maintain-git diff --git a/Documentation/howto/recover-corrupted-object-harder.txt b/Documentation/howto/recover-corrupted-object-harder.txt new file mode 100644 index 0000000..6f33dac --- /dev/null +++ b/Documentation/howto/recover-corrupted-object-harder.txt @@ -0,0 +1,242 @@ +Date: Wed, 16 Oct 2013 04:34:01 -0400 +From: Jeff King <peff@peff.net> +Subject: pack corruption post-mortem +Abstract: Recovering a corrupted object when no good copy is available. +Content-type: text/asciidoc + +How to recover an object from scratch +===================================== + +I was recently presented with a repository with a corrupted packfile, +and was asked if the data was recoverable. This post-mortem describes +the steps I took to investigate and fix the problem. I thought others +might find the process interesting, and it might help somebody in the +same situation. + +******************************** +Note: In this case, no good copy of the repository was available. For +the much easier case where you can get the corrupted object from +elsewhere, see link:recover-corrupted-blob-object.html[this howto]. +******************************** + +I started with an fsck, which found a problem with exactly one object +(I've used $pack and $obj below to keep the output readable, and also +because I'll refer to them later): + +----------- + $ git fsck + error: $pack SHA1 checksum mismatch + error: index CRC mismatch for object $obj from $pack at offset 51653873 + error: inflate: data stream error (incorrect data check) + error: cannot unpack $obj from $pack at offset 51653873 +----------- + +The pack checksum failing means a byte is munged somewhere, and it is +presumably in the object mentioned (since both the index checksum and +zlib were failing). + +Reading the zlib source code, I found that "incorrect data check" means +that the adler-32 checksum at the end of the zlib data did not match the +inflated data. So stepping the data through zlib would not help, as it +did not fail until the very end, when we realize the crc does not match. +The problematic bytes could be anywhere in the object data. + +The first thing I did was pull the broken data out of the packfile. I +needed to know how big the object was, which I found out with: + +------------ + $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep -A1 51653873 + 51653873 + 51664736 +------------ + +Show-index gives us the list of objects and their offsets. We throw away +everything but the offsets, and then sort them so that our interesting +offset (which we got from the fsck output above) is followed immediately +by the offset of the next object. Now we know that the object data is +10863 bytes long, and we can grab it with: + +------------ + dd if=$pack of=object bs=1 skip=51653873 count=10863 +------------ + +I inspected a hexdump of the data, looking for any obvious bogosity +(e.g., a 4K run of zeroes would be a good sign of filesystem +corruption). But everything looked pretty reasonable. + +Note that the "object" file isn't fit for feeding straight to zlib; it +has the git packed object header, which is variable-length. We want to +strip that off so we can start playing with the zlib data directly. You +can either work your way through it manually (the format is described in +link:../technical/pack-format.html[Documentation/technical/pack-format.txt]), +or you can walk through it in a debugger. I did the latter, creating a +valid pack like: + +------------ + # pack magic and version + printf 'PACK\0\0\0\2' >tmp.pack + # pack has one object + printf '\0\0\0\1' >>tmp.pack + # now add our object data + cat object >>tmp.pack + # and then append the pack trailer + /path/to/git.git/test-sha1 -b <tmp.pack >trailer + cat trailer >>tmp.pack +------------ + +and then running "git index-pack tmp.pack" in the debugger (stop at +unpack_raw_entry). Doing this, I found that there were 3 bytes of header +(and the header itself had a sane type and size). So I stripped those +off with: + +------------ + dd if=object of=zlib bs=1 skip=3 +------------ + +I ran the result through zlib's inflate using a custom C program. And +while it did report the error, I did get the right number of output +bytes (i.e., it matched git's size header that we decoded above). But +feeding the result back to "git hash-object" didn't produce the same +sha1. So there were some wrong bytes, but I didn't know which. The file +happened to be C source code, so I hoped I could notice something +obviously wrong with it, but I didn't. I even got it to compile! + +I also tried comparing it to other versions of the same path in the +repository, hoping that there would be some part of the diff that didn't +make sense. Unfortunately, this happened to be the only revision of this +particular file in the repository, so I had nothing to compare against. + +So I took a different approach. Working under the guess that the +corruption was limited to a single byte, I wrote a program to munge each +byte individually, and try inflating the result. Since the object was +only 10K compressed, that worked out to about 2.5M attempts, which took +a few minutes. + +The program I used is here: + +---------------------------------------------- +#include <stdio.h> +#include <unistd.h> +#include <string.h> +#include <signal.h> +#include <zlib.h> + +static int try_zlib(unsigned char *buf, int len) +{ + /* make this absurdly large so we don't have to loop */ + static unsigned char out[1024*1024]; + z_stream z; + int ret; + + memset(&z, 0, sizeof(z)); + inflateInit(&z); + + z.next_in = buf; + z.avail_in = len; + z.next_out = out; + z.avail_out = sizeof(out); + + ret = inflate(&z, 0); + inflateEnd(&z); + return ret >= 0; +} + +/* eye candy */ +static int counter = 0; +static void progress(int sig) +{ + fprintf(stderr, "\r%d", counter); + alarm(1); +} + +int main(void) +{ + /* oversized so we can read the whole buffer in */ + unsigned char buf[1024*1024]; + int len; + unsigned i, j; + + signal(SIGALRM, progress); + alarm(1); + + len = read(0, buf, sizeof(buf)); + for (i = 0; i < len; i++) { + unsigned char c = buf[i]; + for (j = 0; j <= 0xff; j++) { + buf[i] = j; + + counter++; + if (try_zlib(buf, len)) + printf("i=%d, j=%x\n", i, j); + } + buf[i] = c; + } + + alarm(0); + fprintf(stderr, "\n"); + return 0; +} +---------------------------------------------- + +I compiled and ran with: + +------- + gcc -Wall -Werror -O3 munge.c -o munge -lz + ./munge <zlib +------- + + +There were a few false positives early on (if you write "no data" in the +zlib header, zlib thinks it's just fine :) ). But I got a hit about +halfway through: + +------- + i=5642, j=c7 +------- + +I let it run to completion, and got a few more hits at the end (where it +was munging the crc to match our broken data). So there was a good +chance this middle hit was the source of the problem. + +I confirmed by tweaking the byte in a hex editor, zlib inflating the +result (no errors!), and then piping the output into "git hash-object", +which reported the sha1 of the broken object. Success! + +I fixed the packfile itself with: + +------- + chmod +w $pack + printf '\xc7' | dd of=$pack bs=1 seek=51659518 conv=notrunc + chmod -w $pack +------- + +The `\xc7` comes from the replacement byte our "munge" program found. +The offset 51659518 is derived by taking the original object offset +(51653873), adding the replacement offset found by "munge" (5642), and +then adding back in the 3 bytes of git header we stripped. + +After that, "git fsck" ran clean. + +As for the corruption itself, I was lucky that it was indeed a single +byte. In fact, it turned out to be a single bit. The byte 0xc7 was +corrupted to 0xc5. So presumably it was caused by faulty hardware, or a +cosmic ray. + +And the aborted attempt to look at the inflated output to see what was +wrong? I could have looked forever and never found it. Here's the diff +between what the corrupted data inflates to, versus the real data: + +-------------- + - cp = strtok (arg, "+"); + + cp = strtok (arg, "."); +-------------- + +It tweaked one byte and still ended up as valid, readable C that just +happened to do something totally different! One takeaway is that on a +less unlucky day, looking at the zlib output might have actually been +helpful, as most random changes would actually break the C code. + +But more importantly, git's hashing and checksumming noticed a problem +that easily could have gone undetected in another system. The result +still compiled, but would have caused an interesting bug (that would +have been blamed on some random commit). -- 1.8.4.1.898.g8bf8a41.dirty ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-16 15:41 ` Martin Fick 2013-10-17 0:35 ` Jeff King @ 2013-10-17 1:06 ` Duy Nguyen 1 sibling, 0 replies; 16+ messages in thread From: Duy Nguyen @ 2013-10-17 1:06 UTC (permalink / raw) To: Martin Fick; +Cc: Jeff King, Git Mailing List On Wed, Oct 16, 2013 at 10:41 PM, Martin Fick <mfick@codeaurora.org> wrote: >> and then running "git index-pack tmp.pack" in the >> debugger (stop at unpack_raw_entry). Doing this, I found >> that there were 3 bytes of header (and the header itself >> had a sane type and size). So I stripped those off with: >> >> dd if=object of=zlib bs=1 skip=3 > > This too feels like something we should be able to do with a > plumbing command eventually? > > git zlib-extract Not an official plumbing, but I faced similar problems with pack v4. I needed to verify that the output is correct and low level decoding like this is generally a good thing to start with. So I wrote test-dump [1] that can take an offset, a format and try to decode it. It does not support zlib inflation yet, but adding one should be easy. And because this is just a test program we don't really need to think hard before adding something. [1] http://article.gmane.org/gmane.comp.version-control.git/235388 -- Duy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-16 8:34 pack corruption post-mortem Jeff King 2013-10-16 8:59 ` Duy Nguyen 2013-10-16 15:41 ` Martin Fick @ 2013-10-19 10:32 ` Duy Nguyen 2013-10-19 14:41 ` Nicolas Pitre 2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King 3 siblings, 1 reply; 16+ messages in thread From: Duy Nguyen @ 2013-10-19 10:32 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Git Mailing List, Jeff King On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote: > I was recently presented with a repository with a corrupted packfile, > and was asked if the data was recoverable. This post-mortem describes > the steps I took to investigate and fix the problem. I thought others > might find the process interesting, and it might help somebody in the > same situation. > > I started with an fsck, which found a problem with exactly one object > (I've used $pack and $obj below to keep the output readable, and also > because I'll refer to them later): > > $ git fsck > error: $pack SHA1 checksum mismatch > error: index CRC mismatch for object $obj from $pack at offset 51653873 > error: inflate: data stream error (incorrect data check) > error: cannot unpack $obj from $pack at offset 51653873 I wonder if we should protect the sha-1 and pathname tables in packv4 with CRC too. A bit flipped in there could cause stream of corrupt objects and make it hard to pinpoint the corrupt location.. -- Duy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-19 10:32 ` Duy Nguyen @ 2013-10-19 14:41 ` Nicolas Pitre 2013-10-19 19:17 ` Shawn Pearce 2013-10-20 4:44 ` Duy Nguyen 0 siblings, 2 replies; 16+ messages in thread From: Nicolas Pitre @ 2013-10-19 14:41 UTC (permalink / raw) To: Duy Nguyen; +Cc: Git Mailing List, Jeff King On Sat, 19 Oct 2013, Duy Nguyen wrote: > On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote: > > I was recently presented with a repository with a corrupted packfile, > > and was asked if the data was recoverable. This post-mortem describes > > the steps I took to investigate and fix the problem. I thought others > > might find the process interesting, and it might help somebody in the > > same situation. > > > > I started with an fsck, which found a problem with exactly one object > > (I've used $pack and $obj below to keep the output readable, and also > > because I'll refer to them later): > > > > $ git fsck > > error: $pack SHA1 checksum mismatch > > error: index CRC mismatch for object $obj from $pack at offset 51653873 > > error: inflate: data stream error (incorrect data check) > > error: cannot unpack $obj from $pack at offset 51653873 > > I wonder if we should protect the sha-1 and pathname tables in packv4 > with CRC too. A bit flipped in there could cause stream of corrupt > objects and make it hard to pinpoint the corrupt location.. It turns out that we already have this covered. The SHA1 used in the name of the pack file is actually the SHA1 checksum of the SHA1 table. The path and ident tables are already protected by the CRC32 in the zlib deflated stream. Normal objects are also zlib deflated (except for their header) but you need to inflate them in order to have this CRC verified, which the pack data copy tries to avoid. Hence the separate CRC32 in the index file in that case. However the pack v4 tables are very unlikely to be reused as is from one pack to another. Nicolas ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-19 14:41 ` Nicolas Pitre @ 2013-10-19 19:17 ` Shawn Pearce 2013-10-20 20:56 ` Nicolas Pitre 2013-10-20 4:44 ` Duy Nguyen 1 sibling, 1 reply; 16+ messages in thread From: Shawn Pearce @ 2013-10-19 19:17 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Duy Nguyen, Git Mailing List, Jeff King On Sat, Oct 19, 2013 at 7:41 AM, Nicolas Pitre <nico@fluxnic.net> wrote: > On Sat, 19 Oct 2013, Duy Nguyen wrote: > >> On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote: >> > I was recently presented with a repository with a corrupted packfile, >> > and was asked if the data was recoverable. This post-mortem describes >> > the steps I took to investigate and fix the problem. I thought others >> > might find the process interesting, and it might help somebody in the >> > same situation. >> > >> > I started with an fsck, which found a problem with exactly one object >> > (I've used $pack and $obj below to keep the output readable, and also >> > because I'll refer to them later): >> > >> > $ git fsck >> > error: $pack SHA1 checksum mismatch >> > error: index CRC mismatch for object $obj from $pack at offset 51653873 >> > error: inflate: data stream error (incorrect data check) >> > error: cannot unpack $obj from $pack at offset 51653873 >> >> I wonder if we should protect the sha-1 and pathname tables in packv4 >> with CRC too. A bit flipped in there could cause stream of corrupt >> objects and make it hard to pinpoint the corrupt location.. > > It turns out that we already have this covered. > > The SHA1 used in the name of the pack file is actually the SHA1 checksum > of the SHA1 table. I continue to believe this naming is wrong. The pack file name should be the SHA1 checksum of the pack data stream, but the SHA1 table. This would allow cleaner update of a repository that was repacked with different compression settings, but identical objects. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-19 19:17 ` Shawn Pearce @ 2013-10-20 20:56 ` Nicolas Pitre 0 siblings, 0 replies; 16+ messages in thread From: Nicolas Pitre @ 2013-10-20 20:56 UTC (permalink / raw) To: Shawn Pearce; +Cc: Duy Nguyen, Git Mailing List, Jeff King On Sat, 19 Oct 2013, Shawn Pearce wrote: > On Sat, Oct 19, 2013 at 7:41 AM, Nicolas Pitre <nico@fluxnic.net> wrote: > > On Sat, 19 Oct 2013, Duy Nguyen wrote: > > > >> On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote: > >> > I was recently presented with a repository with a corrupted packfile, > >> > and was asked if the data was recoverable. This post-mortem describes > >> > the steps I took to investigate and fix the problem. I thought others > >> > might find the process interesting, and it might help somebody in the > >> > same situation. > >> > > >> > I started with an fsck, which found a problem with exactly one object > >> > (I've used $pack and $obj below to keep the output readable, and also > >> > because I'll refer to them later): > >> > > >> > $ git fsck > >> > error: $pack SHA1 checksum mismatch > >> > error: index CRC mismatch for object $obj from $pack at offset 51653873 > >> > error: inflate: data stream error (incorrect data check) > >> > error: cannot unpack $obj from $pack at offset 51653873 > >> > >> I wonder if we should protect the sha-1 and pathname tables in packv4 > >> with CRC too. A bit flipped in there could cause stream of corrupt > >> objects and make it hard to pinpoint the corrupt location.. > > > > It turns out that we already have this covered. > > > > The SHA1 used in the name of the pack file is actually the SHA1 checksum > > of the SHA1 table. > > I continue to believe this naming is wrong. The pack file name should > be the SHA1 checksum of the pack data stream, but the SHA1 table. This > would allow cleaner update of a repository that was repacked with > different compression settings, but identical objects. OK, after some thoughts, I decided that it is best _not_ to rely on the pack name. The pack name currently carries no meaning what so ever and git works just fine if some packs are arbitrarily named. Your concern about its current naming scheme is certainly legitimate, and I don't want pack v4 to introduce any kind of restrictions here. Furthermore, the SHA1 table is the only pack element which integrity may not independently be verified at the moment. This makes corruption isolation much harder when receiving a streamed pack. Therefore I decided to introduce a small pack v4 format change with the following patch: ----- >8 From: Nicolas Pitre <nico@fluxnic.net> Date: Sun, 20 Oct 2013 14:52:24 -0400 Subject: [PATCH] pack v4: add a SHA1 checksum to the SHA1 table Every packed element currently has some integrity protection coming from the CRC32 embedded in the zlib deflated stream, except for the SHA1 table. Some bit flip corruption in the SHA1 table may have repercutions on a whole lot of objects and could be very hard to isolate. Let's add some integrity protection on the SHA1 table by terminating it with an additional SHA1 entry being the SHA1 checksum of the table. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index fd2e737..a370f26 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -94,7 +94,9 @@ Git pack format === Pack v4 tables - A table of sorted SHA-1 object names for all objects contained in - the on-disk pack. + the on-disk pack, with a final entry being the SHA1 sum of all the + previous entries. The size of this table is therefore + (nr_objects + 1) * 20 bytes. The SHA-1 table in thin packs must include the omitted objects as well. diff --git a/builtin/index-pack.c b/builtin/index-pack.c index caec388..01300d6 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1471,7 +1471,10 @@ static struct packv4_dict *read_dict(void) static void parse_dictionaries(void) { + git_SHA_CTX ctx; + unsigned char table_sha1[20]; int i; + if (!packv4) return; @@ -1485,6 +1488,12 @@ static void parse_dictionaries(void) die(_("wrong order in SHA-1 table at entry %d"), i); } + git_SHA1_Init(&ctx); + git_SHA1_Update(&ctx, sha1_table, 20 * nr_objects_final); + git_SHA1_Final(table_sha1, &ctx); + if (hashcmp(table_sha1, fill_and_use(20)) != 0) + die(_("SHA-1 table checksum mismatch")); + name_dict = read_dict(); path_dict = read_dict(); } diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 9fd5640..eb57ada 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -816,11 +816,18 @@ static void unpack_all(void) use(sizeof(struct pack_header)); if (packv4) { + git_SHA_CTX ctx; + unsigned char table_sha1[20]; sha1_table = xmalloc(20 * nr_objects); for (i = 0; i < nr_objects; i++) { unsigned char *p = sha1_table + i * 20; hashcpy(p, fill_and_use(20)); } + git_SHA1_Init(&ctx); + git_SHA1_Update(&ctx, sha1_table, 20 * nr_objects); + git_SHA1_Final(table_sha1, &ctx); + if (hashcmp(table_sha1, fill_and_use(20)) != 0) + die("SHA-1 table checksum mismatch"); name_dict = read_dict(); path_dict = read_dict(); } diff --git a/packv4-create.c b/packv4-create.c index 14be867..7b51792 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -688,13 +688,20 @@ unsigned long packv4_write_tables(struct sha1file *f, struct pack_idx_entry *objs = v4->all_objs; struct dict_table *commit_ident_table = v4->commit_ident_table; struct dict_table *tree_path_table = v4->tree_path_table; + git_SHA_CTX ctx; + unsigned char table_sha1[20]; unsigned i; unsigned long written = 0; /* The sorted list of object SHA1's is always first */ - for (i = 0; i < nr_objects; i++) + git_SHA1_Init(&ctx); + for (i = 0; i < nr_objects; i++) { + git_SHA1_Update(&ctx, objs[i].sha1, 20); sha1write(f, objs[i].sha1, 20); - written = 20 * nr_objects; + } + git_SHA1_Final(table_sha1, &ctx); + sha1write(f, table_sha1, 20); + written = 20 * (nr_objects + 1); /* Then the commit dictionary table */ written += write_dict_table(f, commit_ident_table, diff --git a/packv4-parse.c b/packv4-parse.c index 31c89c7..e6f5028 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -128,7 +128,7 @@ static struct packv4_dict *load_dict(struct packed_git *p, off_t *offset) static void load_ident_dict(struct packed_git *p) { - off_t offset = 12 + p->num_objects * 20; + off_t offset = 12 + (p->num_objects + 1) * 20; struct packv4_dict *names = load_dict(p, &offset); if (!names) die("bad pack name dictionary in %s", p->pack_name); Nicolas ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-19 14:41 ` Nicolas Pitre 2013-10-19 19:17 ` Shawn Pearce @ 2013-10-20 4:44 ` Duy Nguyen 2013-10-20 21:08 ` Nicolas Pitre 1 sibling, 1 reply; 16+ messages in thread From: Duy Nguyen @ 2013-10-20 4:44 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Git Mailing List, Jeff King On Sat, Oct 19, 2013 at 9:41 PM, Nicolas Pitre <nico@fluxnic.net> wrote: > On Sat, 19 Oct 2013, Duy Nguyen wrote: > The SHA1 used in the name of the pack file is actually the SHA1 checksum > of the SHA1 table. > > The path and ident tables are already protected by the CRC32 in the zlib > deflated stream. > > Normal objects are also zlib deflated (except for their header) but you > need to inflate them in order to have this CRC verified, which the pack > data copy tries to avoid. Hence the separate CRC32 in the index file in > that case. OK slight change in the subject, what about reading code (i.e. sha1_file.c)? With v2 crc32 is verified by object inflate code. With v4 trees or commits, because we store some (or all) data outside of the deflated stream, we will not benefit from crc32 verifcation previously done for all trees and commits. Should we perform explict crc32 check when reading v4 trees and commits (and maybe verify the sha-1 table too)? -- Duy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: pack corruption post-mortem 2013-10-20 4:44 ` Duy Nguyen @ 2013-10-20 21:08 ` Nicolas Pitre 0 siblings, 0 replies; 16+ messages in thread From: Nicolas Pitre @ 2013-10-20 21:08 UTC (permalink / raw) To: Duy Nguyen; +Cc: Git Mailing List, Jeff King On Sun, 20 Oct 2013, Duy Nguyen wrote: > On Sat, Oct 19, 2013 at 9:41 PM, Nicolas Pitre <nico@fluxnic.net> wrote: > > On Sat, 19 Oct 2013, Duy Nguyen wrote: > > The SHA1 used in the name of the pack file is actually the SHA1 checksum > > of the SHA1 table. > > > > The path and ident tables are already protected by the CRC32 in the zlib > > deflated stream. > > > > Normal objects are also zlib deflated (except for their header) but you > > need to inflate them in order to have this CRC verified, which the pack > > data copy tries to avoid. Hence the separate CRC32 in the index file in > > that case. > > OK slight change in the subject, what about reading code (i.e. > sha1_file.c)? With v2 crc32 is verified by object inflate code. With > v4 trees or commits, because we store some (or all) data outside of > the deflated stream, we will not benefit from crc32 verifcation > previously done for all trees and commits. Should we perform explict > crc32 check when reading v4 trees and commits (and maybe verify the > sha-1 table too)? I suppose that we should... at some point. I did the SHA1 table check only for index-pack and unpack-objects in my latest patch. Adding it to check_packed_git_idx() as well should be trivial. I'm not sure about the best way to do systematic checks on tree objects though. We have both the CRC32 recorded in the index file and the object SHA1 recorded in the SHA1 table. But any of them needs to be computed as we walk the object and we currently havn't found the best way to do that yet. So I'd suggest postponing this until the tree walk is properly implemented to perform well first. Nicolas ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] howto: document more tools for recovery corruption 2013-10-16 8:34 pack corruption post-mortem Jeff King ` (2 preceding siblings ...) 2013-10-19 10:32 ` Duy Nguyen @ 2015-04-01 21:08 ` Jeff King 2015-04-01 22:21 ` Junio C Hamano 3 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2015-04-01 21:08 UTC (permalink / raw) To: git Long ago, I documented a corruption recovery I did and gave some C code that I used to help find a flipped bit. I had to fix a similar case recently, and I ended up writing a few more tools. I hope nobody ever has to use these, but it does not hurt to share them, just in case. Signed-off-by: Jeff King <peff@peff.net> --- .../howto/recover-corrupted-object-harder.txt | 237 +++++++++++++++++++++ 1 file changed, 237 insertions(+) diff --git a/Documentation/howto/recover-corrupted-object-harder.txt b/Documentation/howto/recover-corrupted-object-harder.txt index 23e685d..9c4cd09 100644 --- a/Documentation/howto/recover-corrupted-object-harder.txt +++ b/Documentation/howto/recover-corrupted-object-harder.txt @@ -240,3 +240,240 @@ But more importantly, git's hashing and checksumming noticed a problem that easily could have gone undetected in another system. The result still compiled, but would have caused an interesting bug (that would have been blamed on some random commit). + + +The adventure continues... +-------------------------- + +I ended up doing this again! Same entity, new hardware. The assumption +at this point is that the old disk corrupted the packfile, and then the +corruption was migrated to the new hardware (because it was done by +rsync or similar, and no fsck was done at the time of migration). + +This time, the affected blob was over 20 megabytes, which was far too +large to do a brute-force on. I followed the instructions above to +create the `zlib` file. I then used the `inflate` program below to pull +the corrupted data from that. Examining that output gave me a hint about +where in the file the corruption was. But now I was working with the +file itself, not the zlib contents. So knowing the sha1 of the object +and the approximate area of the corruption, I used the `sha1-munge` +program below to brute-force the correct byte. + +Here's the inflate program (it's essentially `gunzip` but without the +`.gz` header processing): + +-------------------------- +#include <stdio.h> +#include <string.h> +#include <zlib.h> +#include <stdlib.h> + +int main(int argc, char **argv) +{ + /* + * oversized so we can read the whole buffer in; + * this could actually be switched to streaming + * to avoid any memory limitations + */ + static unsigned char buf[25 * 1024 * 1024]; + static unsigned char out[25 * 1024 * 1024]; + int len; + z_stream z; + int ret; + + len = read(0, buf, sizeof(buf)); + memset(&z, 0, sizeof(z)); + inflateInit(&z); + + z.next_in = buf; + z.avail_in = len; + z.next_out = out; + z.avail_out = sizeof(out); + + ret = inflate(&z, 0); + if (ret != Z_OK && ret != Z_STREAM_END) + fprintf(stderr, "initial inflate failed (%d)\n", ret); + + fprintf(stderr, "outputting %lu bytes", z.total_out); + fwrite(out, 1, z.total_out, stdout); + return 0; +} +-------------------------- + +And here is the `sha1-munge` program: + +-------------------------- +#include <stdio.h> +#include <unistd.h> +#include <string.h> +#include <signal.h> +#include <openssl/sha.h> +#include <stdlib.h> + +/* eye candy */ +static int counter = 0; +static void progress(int sig) +{ + fprintf(stderr, "\r%d", counter); + alarm(1); +} + +static const signed char hexval_table[256] = { + -1, -1, -1, -1, -1, -1, -1, -1, /* 00-07 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 08-0f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 10-17 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 18-1f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 20-27 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 28-2f */ + 0, 1, 2, 3, 4, 5, 6, 7, /* 30-37 */ + 8, 9, -1, -1, -1, -1, -1, -1, /* 38-3f */ + -1, 10, 11, 12, 13, 14, 15, -1, /* 40-47 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 48-4f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 50-57 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 58-5f */ + -1, 10, 11, 12, 13, 14, 15, -1, /* 60-67 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 68-67 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 70-77 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 78-7f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 80-87 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 88-8f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 90-97 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 98-9f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* a0-a7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* a8-af */ + -1, -1, -1, -1, -1, -1, -1, -1, /* b0-b7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* b8-bf */ + -1, -1, -1, -1, -1, -1, -1, -1, /* c0-c7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* c8-cf */ + -1, -1, -1, -1, -1, -1, -1, -1, /* d0-d7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* d8-df */ + -1, -1, -1, -1, -1, -1, -1, -1, /* e0-e7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* e8-ef */ + -1, -1, -1, -1, -1, -1, -1, -1, /* f0-f7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* f8-ff */ +}; + +static inline unsigned int hexval(unsigned char c) +{ +return hexval_table[c]; +} + +static int get_sha1_hex(const char *hex, unsigned char *sha1) +{ + int i; + for (i = 0; i < 20; i++) { + unsigned int val; + /* + * hex[1]=='\0' is caught when val is checked below, + * but if hex[0] is NUL we have to avoid reading + * past the end of the string: + */ + if (!hex[0]) + return -1; + val = (hexval(hex[0]) << 4) | hexval(hex[1]); + if (val & ~0xff) + return -1; + *sha1++ = val; + hex += 2; + } + return 0; +} + +int main(int argc, char **argv) +{ + /* oversized so we can read the whole buffer in */ + static unsigned char buf[25 * 1024 * 1024]; + char header[32]; + int header_len; + unsigned char have[20], want[20]; + int start, len; + SHA_CTX orig; + unsigned i, j; + + if (!argv[1] || get_sha1_hex(argv[1], want)) { + fprintf(stderr, "usage: sha1-munge <sha1> [start] <file.in\n"); + return 1; + } + + if (argv[2]) + start = atoi(argv[2]); + else + start = 0; + + len = read(0, buf, sizeof(buf)); + header_len = sprintf(header, "blob %d", len) + 1; + fprintf(stderr, "using header: %s\n", header); + + /* + * We keep a running sha1 so that if you are munging + * near the end of the file, we do not have to re-sha1 + * the unchanged earlier bytes + */ + SHA1_Init(&orig); + SHA1_Update(&orig, header, header_len); + if (start) + SHA1_Update(&orig, buf, start); + + signal(SIGALRM, progress); + alarm(1); + + for (i = start; i < len; i++) { + unsigned char c; + SHA_CTX x; + +#if 0 + /* + * deletion -- this would not actually work in practice, + * I think, because we've already committed to a + * particular size in the header. Ditto for addition + * below. In those cases, you'd have to do the whole + * sha1 from scratch, or possibly keep three running + * "orig" sha1 computations going. + */ + memcpy(&x, &orig, sizeof(x)); + SHA1_Update(&x, buf + i + 1, len - i - 1); + SHA1_Final(have, &x); + if (!memcmp(have, want, 20)) + printf("i=%d, deletion\n", i); +#endif + + /* + * replacement -- note that this tries each of the 256 + * possible bytes. If you suspect a single-bit flip, + * it would be much shorter to just try the 8 + * bit-flipped variants. + */ + c = buf[i]; + for (j = 0; j <= 0xff; j++) { + buf[i] = j; + + memcpy(&x, &orig, sizeof(x)); + SHA1_Update(&x, buf + i, len - i); + SHA1_Final(have, &x); + if (!memcmp(have, want, 20)) + printf("i=%d, j=%02x\n", i, j); + } + buf[i] = c; + +#if 0 + /* addition */ + for (j = 0; j <= 0xff; j++) { + unsigned char extra = j; + memcpy(&x, &orig, sizeof(x)); + SHA1_Update(&x, &extra, 1); + SHA1_Update(&x, buf + i, len - i); + SHA1_Final(have, &x); + if (!memcmp(have, want, 20)) + printf("i=%d, addition=%02x", i, j); + } +#endif + + SHA1_Update(&orig, buf + i, 1); + counter++; + } + + alarm(0); + fprintf(stderr, "\r%d\n", counter); + return 0; +} +-------------------------- -- 2.4.0.rc0.363.gf9f328b ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] howto: document more tools for recovery corruption 2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King @ 2015-04-01 22:21 ` Junio C Hamano 2015-04-02 0:49 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2015-04-01 22:21 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > Long ago, I documented a corruption recovery I did and gave > some C code that I used to help find a flipped bit. I had > to fix a similar case recently, and I ended up writing a few > more tools. I hope nobody ever has to use these, but it > does not hurt to share them, just in case. I am having a hard time deciding if I should take the Date: header of the patch e-mail into consideration. The munge thing looks serious enough, though. > Signed-off-by: Jeff King <peff@peff.net> > --- > .../howto/recover-corrupted-object-harder.txt | 237 +++++++++++++++++++++ > 1 file changed, 237 insertions(+) > > diff --git a/Documentation/howto/recover-corrupted-object-harder.txt > b/Documentation/howto/recover-corrupted-object-harder.txt > index 23e685d..9c4cd09 100644 > --- a/Documentation/howto/recover-corrupted-object-harder.txt > +++ b/Documentation/howto/recover-corrupted-object-harder.txt > @@ -240,3 +240,240 @@ But more importantly, git's hashing and checksumming noticed a problem > that easily could have gone undetected in another system. The result > still compiled, but would have caused an interesting bug (that would > have been blamed on some random commit). > + > + > +The adventure continues... > +-------------------------- > + > +I ended up doing this again! Same entity, new hardware. The assumption > +at this point is that the old disk corrupted the packfile, and then the > +corruption was migrated to the new hardware (because it was done by > +rsync or similar, and no fsck was done at the time of migration). > + > +This time, the affected blob was over 20 megabytes, which was far too > +large to do a brute-force on. I followed the instructions above to > +create the `zlib` file. I then used the `inflate` program below to pull > +the corrupted data from that. Examining that output gave me a hint about > +where in the file the corruption was. But now I was working with the > +file itself, not the zlib contents. So knowing the sha1 of the object > +and the approximate area of the corruption, I used the `sha1-munge` > +program below to brute-force the correct byte. > + > +Here's the inflate program (it's essentially `gunzip` but without the > +`.gz` header processing): > + > +-------------------------- > +#include <stdio.h> > +#include <string.h> > +#include <zlib.h> > +#include <stdlib.h> > + > +int main(int argc, char **argv) > +{ > + /* > + * oversized so we can read the whole buffer in; > + * this could actually be switched to streaming > + * to avoid any memory limitations > + */ > + static unsigned char buf[25 * 1024 * 1024]; > + static unsigned char out[25 * 1024 * 1024]; > + int len; > + z_stream z; > + int ret; > + > + len = read(0, buf, sizeof(buf)); > + memset(&z, 0, sizeof(z)); > + inflateInit(&z); > + > + z.next_in = buf; > + z.avail_in = len; > + z.next_out = out; > + z.avail_out = sizeof(out); > + > + ret = inflate(&z, 0); > + if (ret != Z_OK && ret != Z_STREAM_END) > + fprintf(stderr, "initial inflate failed (%d)\n", ret); > + > + fprintf(stderr, "outputting %lu bytes", z.total_out); > + fwrite(out, 1, z.total_out, stdout); > + return 0; > +} > +-------------------------- > + > +And here is the `sha1-munge` program: > + > +-------------------------- > +#include <stdio.h> > +#include <unistd.h> > +#include <string.h> > +#include <signal.h> > +#include <openssl/sha.h> > +#include <stdlib.h> > + > +/* eye candy */ > +static int counter = 0; > +static void progress(int sig) > +{ > + fprintf(stderr, "\r%d", counter); > + alarm(1); > +} > + > +static const signed char hexval_table[256] = { > + -1, -1, -1, -1, -1, -1, -1, -1, /* 00-07 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 08-0f */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 10-17 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 18-1f */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 20-27 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 28-2f */ > + 0, 1, 2, 3, 4, 5, 6, 7, /* 30-37 */ > + 8, 9, -1, -1, -1, -1, -1, -1, /* 38-3f */ > + -1, 10, 11, 12, 13, 14, 15, -1, /* 40-47 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 48-4f */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 50-57 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 58-5f */ > + -1, 10, 11, 12, 13, 14, 15, -1, /* 60-67 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 68-67 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 70-77 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 78-7f */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 80-87 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 88-8f */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 90-97 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* 98-9f */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* a0-a7 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* a8-af */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* b0-b7 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* b8-bf */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* c0-c7 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* c8-cf */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* d0-d7 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* d8-df */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* e0-e7 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* e8-ef */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* f0-f7 */ > + -1, -1, -1, -1, -1, -1, -1, -1, /* f8-ff */ > +}; > + > +static inline unsigned int hexval(unsigned char c) > +{ > +return hexval_table[c]; > +} > + > +static int get_sha1_hex(const char *hex, unsigned char *sha1) > +{ > + int i; > + for (i = 0; i < 20; i++) { > + unsigned int val; > + /* > + * hex[1]=='\0' is caught when val is checked below, > + * but if hex[0] is NUL we have to avoid reading > + * past the end of the string: > + */ > + if (!hex[0]) > + return -1; > + val = (hexval(hex[0]) << 4) | hexval(hex[1]); > + if (val & ~0xff) > + return -1; > + *sha1++ = val; > + hex += 2; > + } > + return 0; > +} > + > +int main(int argc, char **argv) > +{ > + /* oversized so we can read the whole buffer in */ > + static unsigned char buf[25 * 1024 * 1024]; > + char header[32]; > + int header_len; > + unsigned char have[20], want[20]; > + int start, len; > + SHA_CTX orig; > + unsigned i, j; > + > + if (!argv[1] || get_sha1_hex(argv[1], want)) { > + fprintf(stderr, "usage: sha1-munge <sha1> [start] <file.in\n"); > + return 1; > + } > + > + if (argv[2]) > + start = atoi(argv[2]); > + else > + start = 0; > + > + len = read(0, buf, sizeof(buf)); > + header_len = sprintf(header, "blob %d", len) + 1; > + fprintf(stderr, "using header: %s\n", header); > + > + /* > + * We keep a running sha1 so that if you are munging > + * near the end of the file, we do not have to re-sha1 > + * the unchanged earlier bytes > + */ > + SHA1_Init(&orig); > + SHA1_Update(&orig, header, header_len); > + if (start) > + SHA1_Update(&orig, buf, start); > + > + signal(SIGALRM, progress); > + alarm(1); > + > + for (i = start; i < len; i++) { > + unsigned char c; > + SHA_CTX x; > + > +#if 0 > + /* > + * deletion -- this would not actually work in practice, > + * I think, because we've already committed to a > + * particular size in the header. Ditto for addition > + * below. In those cases, you'd have to do the whole > + * sha1 from scratch, or possibly keep three running > + * "orig" sha1 computations going. > + */ > + memcpy(&x, &orig, sizeof(x)); > + SHA1_Update(&x, buf + i + 1, len - i - 1); > + SHA1_Final(have, &x); > + if (!memcmp(have, want, 20)) > + printf("i=%d, deletion\n", i); > +#endif > + > + /* > + * replacement -- note that this tries each of the 256 > + * possible bytes. If you suspect a single-bit flip, > + * it would be much shorter to just try the 8 > + * bit-flipped variants. > + */ > + c = buf[i]; > + for (j = 0; j <= 0xff; j++) { > + buf[i] = j; > + > + memcpy(&x, &orig, sizeof(x)); > + SHA1_Update(&x, buf + i, len - i); > + SHA1_Final(have, &x); > + if (!memcmp(have, want, 20)) > + printf("i=%d, j=%02x\n", i, j); > + } > + buf[i] = c; > + > +#if 0 > + /* addition */ > + for (j = 0; j <= 0xff; j++) { > + unsigned char extra = j; > + memcpy(&x, &orig, sizeof(x)); > + SHA1_Update(&x, &extra, 1); > + SHA1_Update(&x, buf + i, len - i); > + SHA1_Final(have, &x); > + if (!memcmp(have, want, 20)) > + printf("i=%d, addition=%02x", i, j); > + } > +#endif > + > + SHA1_Update(&orig, buf + i, 1); > + counter++; > + } > + > + alarm(0); > + fprintf(stderr, "\r%d\n", counter); > + return 0; > +} > +-------------------------- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] howto: document more tools for recovery corruption 2015-04-01 22:21 ` Junio C Hamano @ 2015-04-02 0:49 ` Jeff King 0 siblings, 0 replies; 16+ messages in thread From: Jeff King @ 2015-04-02 0:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, Apr 01, 2015 at 03:21:16PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Long ago, I documented a corruption recovery I did and gave > > some C code that I used to help find a flipped bit. I had > > to fix a similar case recently, and I ended up writing a few > > more tools. I hope nobody ever has to use these, but it > > does not hurt to share them, just in case. > > I am having a hard time deciding if I should take the Date: header > of the patch e-mail into consideration. The munge thing looks > serious enough, though. Heh, no, this is sadly a serious thing that I did today (but I was able to detect and correct a single flipped bit in a 60MB packfile, which is kind of neat, I guess). I hesitated sending them at all because they are not really note-worthy. OTOH, during today's exercise I found the instructions and sample program I had written last time to be very useful, so perhaps it can help somebody (or even me) at some later today. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-04-02 0:50 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-10-16 8:34 pack corruption post-mortem Jeff King 2013-10-16 8:59 ` Duy Nguyen 2013-10-16 15:41 ` Martin Fick 2013-10-17 0:35 ` Jeff King 2013-10-17 15:47 ` Junio C Hamano 2013-10-25 7:55 ` Jeff King 2013-10-17 1:06 ` Duy Nguyen 2013-10-19 10:32 ` Duy Nguyen 2013-10-19 14:41 ` Nicolas Pitre 2013-10-19 19:17 ` Shawn Pearce 2013-10-20 20:56 ` Nicolas Pitre 2013-10-20 4:44 ` Duy Nguyen 2013-10-20 21:08 ` Nicolas Pitre 2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King 2015-04-01 22:21 ` Junio C Hamano 2015-04-02 0:49 ` Jeff King
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).