pack corruption post-mortem

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* pack corruption post-mortem
@ 2013-10-16  8:34 Jeff King
  2013-10-16  8:59 ` Duy Nguyen
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Jeff King @ 2013-10-16  8:34 UTC (permalink / raw)
  To: git

I was recently presented with a repository with a corrupted packfile,
and was asked if the data was recoverable. This post-mortem describes
the steps I took to investigate and fix the problem. I thought others
might find the process interesting, and it might help somebody in the
same situation.

I started with an fsck, which found a problem with exactly one object
(I've used $pack and $obj below to keep the output readable, and also
because I'll refer to them later):

    $ git fsck
    error: $pack SHA1 checksum mismatch
    error: index CRC mismatch for object $obj from $pack at offset 51653873
    error: inflate: data stream error (incorrect data check)
    error: cannot unpack $obj from $pack at offset 51653873

The pack checksum failing means a byte is munged somewhere, and it is
presumably in the object mentioned (since both the index checksum and
zlib were failing).

Reading the zlib source code, I found that "incorrect data check" means
that the adler-32 checksum at the end of the zlib data did not match the
inflated data. So stepping the data through zlib would not help, as it
did not fail until the very end, when we realize the crc does not match.
The problematic bytes could be anywhere in the object data.

The first thing I did was pull the broken data out of the packfile. I
needed to know how big the object was, which I found out with:

  $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep -A1 51653873
  51653873
  51664736

Show-index gives us the list of objects and their offsets. We throw away
everything but the offsets, and then sort them so that our interesting
offset (which we got from the fsck output above) is followed immediately
by the offset of the next object. Now we know that the object data is
10863 bytes long, and we can grab it with:

  dd if=$pack of=object bs=1 skip=51653873 count=10863

I inspected a hexdump of the data, looking for any obvious bogosity
(e.g., a 4K run of zeroes would be a good sign of filesystem
corruption). But everything looked pretty reasonable.

Note that the "object" file isn't fit for feeding straight to zlib; it
has the git packed object header, which is variable-length. We want to
strip that off so we can start playing with the zlib data directly. You
can either work your way through it manually (the format is described in
Documentation/technical/pack-format.txt), or you can walk through it in
a debugger. I did the latter, creating a valid pack like:

  # pack magic and version
  printf 'PACK\0\0\0\2' >tmp.pack
  # pack has one object
  printf '\0\0\0\1' >>tmp.pack
  # now add our object data
  cat object >>tmp.pack
  # and then append the pack trailer
  /path/to/git.git/test-sha1 -b <tmp.pack >trailer
  cat trailer >>tmp.pack

and then running "git index-pack tmp.pack" in the debugger (stop at
unpack_raw_entry). Doing this, I found that there were 3 bytes of header
(and the header itself had a sane type and size). So I stripped those
off with:

  dd if=object of=zlib bs=1 skip=3

I ran the result through zlib's inflate using a custom C program. And
while it did report the error, I did get the right number of output
bytes (i.e., it matched git's size header that we decoded above). But
feeding the result back to "git hash-object" didn't produce the same
sha1. So there were some wrong bytes, but I didn't know which. The file
happened to be C source code, so I hoped I could notice something
obviously wrong with it, but I didn't. I even got it to compile!

I also tried comparing it to other versions of the same path in the
repository, hoping that there would be some part of the diff that didn't
make sense. Unfortunately, this happened to be the only revision of this
particular file in the repository, so I had nothing to compare against.

So I took a different approach. Working under the guess that the
corruption was limited to a single byte, I wrote a program to munge each
byte individually, and try inflating the result. Since the object was
only 10K compressed, that worked out to about 2.5M attempts, which took
a few minutes.

The program I used is here:

-- >8 --
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
#include <zlib.h>

static int try_zlib(unsigned char *buf, int len)
{
  /* make this absurdly large so we don't have to loop */
  static unsigned char out[1024*1024];
  z_stream z;
  int ret;

  memset(&z, 0, sizeof(z));
  inflateInit(&z);

  z.next_in = buf;
  z.avail_in = len;
  z.next_out = out;
  z.avail_out = sizeof(out);

  ret = inflate(&z, 0);
  inflateEnd(&z);
  return ret >= 0;
}

/* eye candy */
static int counter = 0;
static void progress(int sig)
{
  fprintf(stderr, "\r%d", counter);
  alarm(1);
}

int main(void)
{
  /* oversized so we can read the whole buffer in */
  unsigned char buf[1024*1024];
  int len;
  unsigned i, j;

  signal(SIGALRM, progress);
  alarm(1);

  len = read(0, buf, sizeof(buf));
  for (i = 0; i < len; i++) {
    unsigned char c = buf[i];
    for (j = 0; j <= 0xff; j++) {
      buf[i] = j;

      counter++;
      if (try_zlib(buf, len))
        printf("i=%d, j=%x\n", i, j);
    }
    buf[i] = c;
  }

  alarm(0);
  fprintf(stderr, "\n");
  return 0;
}
-- >8 --

I compiled and ran with:

  gcc -Wall -Werror -O3 munge.c -o munge -lz
  ./munge <zlib

There were a few false positives early on (if you write "no data" in the
zlib header, zlib thinks it's just fine :) ). But I got a hit about
halfway through:

  i=5642, j=c7

I let it run to completion, and got a few more hits at the end (where it
was munging the crc to match our broken data). So there was a good
chance this middle hit was the source of the problem.

I confirmed by tweaking the byte in a hex editor, zlib inflating the
result (no errors!), and then piping the output into "git hash-object",
which reported the sha1 of the broken object. Success!

I fixed the packfile itself with:

  chmod +w $pack
  printf '\xc7' | dd of=$pack bs=1 seek=51659518 conv=notrunc
  chmod -w $pack

The '\xc7' comes from the replacement byte our "munge" program found.
The offset 51659518 is derived by taking the original object offset
(51653873), adding the replacement offset found by "munge" (5642), and
then adding back in the 3 bytes of git header we stripped.

After that, "git fsck" ran clean.

As for the corruption itself, I was lucky that it was indeed a single
byte. In fact, it turned out to be a single bit. The byte 0xc7 was
corrupted to 0xc5. So presumably it was caused by faulty hardware, or a
cosmic ray.

And the aborted attempt to look at the inflated output to see what was
wrong? I could have looked forever and never found it. Here's the diff
between what the corrupted data inflates to, versus the real data:

  -       cp = strtok (arg, "+");
  +       cp = strtok (arg, ".");

It tweaked one byte and still ended up as valid, readable C that just
happened to do something totally different! One takeaway is that on a
less unlucky day, looking at the zlib output might have actually been
helpful, as most random changes would actually break the C code.

But more importantly, git's hashing and checksumming noticed a problem
that easily could have gone undetected in another system. The result
still compiled, but would have caused an interesting bug (that would
have been blamed on some random commit).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-16  8:34 pack corruption post-mortem Jeff King
@ 2013-10-16  8:59 ` Duy Nguyen
  2013-10-16 15:41 ` Martin Fick
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Duy Nguyen @ 2013-10-16  8:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote:
> I was recently presented with a repository with a corrupted packfile,
> and was asked if the data was recoverable. This post-mortem describes
> the steps I took to investigate and fix the problem. I thought others
> might find the process interesting, and it might help somebody in the
> same situation.

It's like reading an LWN article. Thank you.
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-16  8:34 pack corruption post-mortem Jeff King
  2013-10-16  8:59 ` Duy Nguyen
@ 2013-10-16 15:41 ` Martin Fick
  2013-10-17  0:35   ` Jeff King
  2013-10-17  1:06   ` Duy Nguyen
  2013-10-19 10:32 ` Duy Nguyen
  2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King
  3 siblings, 2 replies; 16+ messages in thread
From: Martin Fick @ 2013-10-16 15:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wednesday, October 16, 2013 02:34:01 am Jeff King wrote:
> I was recently presented with a repository with a
> corrupted packfile, and was asked if the data was
> recoverable. This post-mortem describes the steps I took
> to investigate and fix the problem. I thought others
> might find the process interesting, and it might help
> somebody in the same situation.

This is awesome Peff, thanks for the great writeup!

I have nightmares about this sort of thing every now and 
then, and we even experience some corruption here and there 
that needs to be fixed (mainly missing objects when we toy 
with different git repack arguments).  I cannot help but 
wonder, how we can improve git further to either help 
diagnose or even fix some of these problems?  More inline 
below...


> The first thing I did was pull the broken data out of the
> packfile. I needed to know how big the object was, which
> I found out with:
> 
>   $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep
> -A1 51653873 51653873
>   51664736
> 
> Show-index gives us the list of objects and their
> offsets. We throw away everything but the offsets, and
> then sort them so that our interesting offset (which we
> got from the fsck output above) is followed immediately
> by the offset of the next object. Now we know that the
> object data is 10863 bytes long, and we can grab it
> with:
> 
>   dd if=$pack of=object bs=1 skip=51653873 count=10863

Is there a current plumbing command that should be enhanced 
to be able to do the 2 steps above directly for people 
debugging (maybe with some new switch)?  If not, should we 
create one, git show --zlib, or git cat-file --zlib?


> Note that the "object" file isn't fit for feeding
> straight to zlib; it has the git packed object header,
> which is variable-length. We want to strip that off so
> we can start playing with the zlib data directly. You
> can either work your way through it manually (the format
> is described in
> Documentation/technical/pack-format.txt), or you can
> walk through it in a debugger. I did the latter,
> creating a valid pack like:
> 
>   # pack magic and version
>   printf 'PACK\0\0\0\2' >tmp.pack
>   # pack has one object
>   printf '\0\0\0\1' >>tmp.pack
>   # now add our object data
>   cat object >>tmp.pack
>   # and then append the pack trailer
>   /path/to/git.git/test-sha1 -b <tmp.pack >trailer
>   cat trailer >>tmp.pack
> 
> and then running "git index-pack tmp.pack" in the
> debugger (stop at unpack_raw_entry). Doing this, I found
> that there were 3 bytes of header (and the header itself
> had a sane type and size). So I stripped those off with:
> 
>   dd if=object of=zlib bs=1 skip=3

This too feels like something we should be able to do with a 
plumbing command eventually?

git zlib-extract

> So I took a different approach. Working under the guess
> that the corruption was limited to a single byte, I
> wrote a program to munge each byte individually, and try
> inflating the result. Since the object was only 10K
> compressed, that worked out to about 2.5M attempts,
> which took a few minutes.

Awesome!  Would this make a good new plumbing command, git 
zlib-fix?


> I fixed the packfile itself with:
> 
>   chmod +w $pack
>   printf '\xc7' | dd of=$pack bs=1 seek=51659518
> conv=notrunc chmod -w $pack
> 
> The '\xc7' comes from the replacement byte our "munge"
> program found. The offset 51659518 is derived by taking
> the original object offset (51653873), adding the
> replacement offset found by "munge" (5642), and then
> adding back in the 3 bytes of git header we stripped.

Another plumbing command needed?  git pack-put --zlib?

I am not saying my command suggestions are good, but maybe 
they will inspire the right answer?

-Martin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-16 15:41 ` Martin Fick
@ 2013-10-17  0:35   ` Jeff King
  2013-10-17 15:47     ` Junio C Hamano
  2013-10-17  1:06   ` Duy Nguyen
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff King @ 2013-10-17  0:35 UTC (permalink / raw)
  To: Martin Fick; +Cc: git

On Wed, Oct 16, 2013 at 09:41:16AM -0600, Martin Fick wrote:

> I have nightmares about this sort of thing every now and 
> then, and we even experience some corruption here and there 
> that needs to be fixed (mainly missing objects when we toy 
> with different git repack arguments).  I cannot help but 
> wonder, how we can improve git further to either help 
> diagnose or even fix some of these problems?  More inline 
> below...

In general, I don't think we know enough about patterns of recovery
corruption to say which commands would definitely be worth implementing.
Part of the reason I wrote this up is to document this one case. But
this is the first time in 7 years of git usage that I've had to do this.
So I'd feel a little bit better about sinking time into it after seeing
a few more cases and realizing where the patterns are.

One of the major hassles is that the assumptions you can and can't make
depend on what data you have that _isn't_ corrupted. Do you have a pack
index, or a bare pack? Do you have zlib data that fails the crc, or zlib
data that cannot be parsed?

In this case there was no other copy of the repository. But if you know
the broken object (which we did here), and you can copy it from
elsewhere, then git already will try to find other sources of the
object (loose, or in another pack).

> >   dd if=$pack of=object bs=1 skip=51653873 count=10863
> 
> Is there a current plumbing command that should be enhanced 
> to be able to do the 2 steps above directly for people 
> debugging (maybe with some new switch)?  If not, should we 
> create one, git show --zlib, or git cat-file --zlib?

Most of the git plumbing commands deal with data at the object layer.
This is really about going a step below and saying "Give me the on-disk
representation of the object". We recently introduced an
"%(objectsize:disk)" formatter for cat-file. The logical extension would
be to ask for "%(contents:disk)" or something. Though what you get would
depend on how the object is stored, so you would need to figure that out
to do anything useful with it.

Note that this implies you actually have a packfile index that says
"object XXX is at offset YYY". In some corruption cases, you might have
only a packfile. That is generally enough to generate the index, but if
there is corruption, you cannot actually parse the pack to find out the
sha1 of the objects.

So in the worst case, what you really want is something like "dump the
object in packfile X at offset Y". But even then, you don't know the
length of the object. The packfile is a stream, and the length we
calculated is from the index, which depends on the zlib data parsing in
some sane way.

> > and then running "git index-pack tmp.pack" in the
> > debugger (stop at unpack_raw_entry). Doing this, I found
> > that there were 3 bytes of header (and the header itself
> > had a sane type and size). So I stripped those off with:
> > 
> >   dd if=object of=zlib bs=1 skip=3
> 
> This too feels like something we should be able to do with a 
> plumbing command eventually?
> 
> git zlib-extract

Perhaps. I think if you had some "extract object at offset X from the
packfile" command, it would be optional to give you the whole thing, or
just the zlib data.

> > So I took a different approach. Working under the guess
> > that the corruption was limited to a single byte, I
> > wrote a program to munge each byte individually, and try
> > inflating the result. Since the object was only 10K
> > compressed, that worked out to about 2.5M attempts,
> > which took a few minutes.
> 
> Awesome!  Would this make a good new plumbing command, git 
> zlib-fix?

I'd like to see it actually work more than once first. This relies on
there being single-byte corruption. Even double-byte corruption starts
to get expensive to brute-force like this. SHA1, by its nature, requires
brute-forcing. But it's possible that the crc, not being
cryptographically secure, could be reverse-engineered to find likely
spots of corruption. I don't know enough about it to say.

> > I fixed the packfile itself with:
> > 
> >   chmod +w $pack
> >   printf '\xc7' | dd of=$pack bs=1 seek=51659518
> > conv=notrunc chmod -w $pack
> > 
> > The '\xc7' comes from the replacement byte our "munge"
> > program found. The offset 51659518 is derived by taking
> > the original object offset (51653873), adding the
> > replacement offset found by "munge" (5642), and then
> > adding back in the 3 bytes of git header we stripped.
> 
> Another plumbing command needed?  git pack-put --zlib?

I think in this case that dd does a nice job of solving the problem.
Some of the stuff I did was very git-specific and required knowledge of
the formats. But this one is really just "replace byte X at offset Y",
and I don't see any need to avoid a general-purpose tool (except that
dd is itself reasonably arcane :) ).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-17  0:35   ` Jeff King
@ 2013-10-17 15:47     ` Junio C Hamano
  2013-10-25  7:55       ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2013-10-17 15:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Fick, git

Jeff King <peff@peff.net> writes:

> On Wed, Oct 16, 2013 at 09:41:16AM -0600, Martin Fick wrote:
>
>> I have nightmares about this sort of thing every now and 
>> then, and we even experience some corruption here and there 
>> that needs to be fixed (mainly missing objects when we toy 
>> with different git repack arguments).  I cannot help but 
>> wonder, how we can improve git further to either help 
>> diagnose or even fix some of these problems?  More inline 
>> below...
>
> In general, I don't think we know enough about patterns of recovery
> corruption to say which commands would definitely be worth implementing.
> Part of the reason I wrote this up is to document this one case. But
> this is the first time in 7 years of git usage that I've had to do this.
> So I'd feel a little bit better about sinking time into it after seeing
> a few more cases and realizing where the patterns are.

There was one area in our Documentation/ set we used to use to keep
this kind of message almost as-is; perhaps this message fits there?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-17 15:47     ` Junio C Hamano
@ 2013-10-25  7:55       ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2013-10-25  7:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Fick, git

On Thu, Oct 17, 2013 at 08:47:05AM -0700, Junio C Hamano wrote:

> > In general, I don't think we know enough about patterns of recovery
> > corruption to say which commands would definitely be worth implementing.
> > Part of the reason I wrote this up is to document this one case. But
> > this is the first time in 7 years of git usage that I've had to do this.
> > So I'd feel a little bit better about sinking time into it after seeing
> > a few more cases and realizing where the patterns are.
> 
> There was one area in our Documentation/ set we used to use to keep
> this kind of message almost as-is; perhaps this message fits there?

I've wondered if that howto section has much value, as they are already
available in the list archive, and often show their age after a while.
Still, I suppose it is a sort of curated list of interesting posts.

Here's my article, all gussied up for the howto directory. Take it or
leave it.

-- >8 --
Subject: [PATCH] howto: add article on recovering a corrupted object

This is an asciidoc-ified version of a corruption post-mortem sent to
the git list. It complements the existing howto article, since it covers
a case where the object couldn't be easily recreated or copied from
elsewhere.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/Makefile                             |   1 +
 .../howto/recover-corrupted-object-harder.txt      | 242 +++++++++++++++++++++
 2 files changed, 243 insertions(+)
 create mode 100644 Documentation/howto/recover-corrupted-object-harder.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 4f13a23..91a12c7 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -53,6 +53,7 @@ SP_ARTICLES += howto/setup-git-server-over-http
 SP_ARTICLES += howto/separating-topic-branches
 SP_ARTICLES += howto/revert-a-faulty-merge
 SP_ARTICLES += howto/recover-corrupted-blob-object
+SP_ARTICLES += howto/recover-corrupted-object-harder
 SP_ARTICLES += howto/rebuild-from-update-hook
 SP_ARTICLES += howto/rebase-from-internal-branch
 SP_ARTICLES += howto/maintain-git
diff --git a/Documentation/howto/recover-corrupted-object-harder.txt b/Documentation/howto/recover-corrupted-object-harder.txt
new file mode 100644
index 0000000..6f33dac
--- /dev/null
+++ b/Documentation/howto/recover-corrupted-object-harder.txt
@@ -0,0 +1,242 @@
+Date: Wed, 16 Oct 2013 04:34:01 -0400
+From: Jeff King <peff@peff.net>
+Subject: pack corruption post-mortem
+Abstract: Recovering a corrupted object when no good copy is available.
+Content-type: text/asciidoc
+
+How to recover an object from scratch
+=====================================
+
+I was recently presented with a repository with a corrupted packfile,
+and was asked if the data was recoverable. This post-mortem describes
+the steps I took to investigate and fix the problem. I thought others
+might find the process interesting, and it might help somebody in the
+same situation.
+
+********************************
+Note: In this case, no good copy of the repository was available. For
+the much easier case where you can get the corrupted object from
+elsewhere, see link:recover-corrupted-blob-object.html[this howto].
+********************************
+
+I started with an fsck, which found a problem with exactly one object
+(I've used $pack and $obj below to keep the output readable, and also
+because I'll refer to them later):
+
+-----------
+    $ git fsck
+    error: $pack SHA1 checksum mismatch
+    error: index CRC mismatch for object $obj from $pack at offset 51653873
+    error: inflate: data stream error (incorrect data check)
+    error: cannot unpack $obj from $pack at offset 51653873
+-----------
+
+The pack checksum failing means a byte is munged somewhere, and it is
+presumably in the object mentioned (since both the index checksum and
+zlib were failing).
+
+Reading the zlib source code, I found that "incorrect data check" means
+that the adler-32 checksum at the end of the zlib data did not match the
+inflated data. So stepping the data through zlib would not help, as it
+did not fail until the very end, when we realize the crc does not match.
+The problematic bytes could be anywhere in the object data.
+
+The first thing I did was pull the broken data out of the packfile. I
+needed to know how big the object was, which I found out with:
+
+------------
+    $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep -A1 51653873
+    51653873
+    51664736
+------------
+
+Show-index gives us the list of objects and their offsets. We throw away
+everything but the offsets, and then sort them so that our interesting
+offset (which we got from the fsck output above) is followed immediately
+by the offset of the next object. Now we know that the object data is
+10863 bytes long, and we can grab it with:
+
+------------
+  dd if=$pack of=object bs=1 skip=51653873 count=10863
+------------
+
+I inspected a hexdump of the data, looking for any obvious bogosity
+(e.g., a 4K run of zeroes would be a good sign of filesystem
+corruption). But everything looked pretty reasonable.
+
+Note that the "object" file isn't fit for feeding straight to zlib; it
+has the git packed object header, which is variable-length. We want to
+strip that off so we can start playing with the zlib data directly. You
+can either work your way through it manually (the format is described in
+link:../technical/pack-format.html[Documentation/technical/pack-format.txt]),
+or you can walk through it in a debugger. I did the latter, creating a
+valid pack like:
+
+------------
+    # pack magic and version
+    printf 'PACK\0\0\0\2' >tmp.pack
+    # pack has one object
+    printf '\0\0\0\1' >>tmp.pack
+    # now add our object data
+    cat object >>tmp.pack
+    # and then append the pack trailer
+    /path/to/git.git/test-sha1 -b <tmp.pack >trailer
+    cat trailer >>tmp.pack
+------------
+
+and then running "git index-pack tmp.pack" in the debugger (stop at
+unpack_raw_entry). Doing this, I found that there were 3 bytes of header
+(and the header itself had a sane type and size). So I stripped those
+off with:
+
+------------
+    dd if=object of=zlib bs=1 skip=3
+------------
+
+I ran the result through zlib's inflate using a custom C program. And
+while it did report the error, I did get the right number of output
+bytes (i.e., it matched git's size header that we decoded above). But
+feeding the result back to "git hash-object" didn't produce the same
+sha1. So there were some wrong bytes, but I didn't know which. The file
+happened to be C source code, so I hoped I could notice something
+obviously wrong with it, but I didn't. I even got it to compile!
+
+I also tried comparing it to other versions of the same path in the
+repository, hoping that there would be some part of the diff that didn't
+make sense. Unfortunately, this happened to be the only revision of this
+particular file in the repository, so I had nothing to compare against.
+
+So I took a different approach. Working under the guess that the
+corruption was limited to a single byte, I wrote a program to munge each
+byte individually, and try inflating the result. Since the object was
+only 10K compressed, that worked out to about 2.5M attempts, which took
+a few minutes.
+
+The program I used is here:
+
+----------------------------------------------
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <signal.h>
+#include <zlib.h>
+
+static int try_zlib(unsigned char *buf, int len)
+{
+	/* make this absurdly large so we don't have to loop */
+	static unsigned char out[1024*1024];
+	z_stream z;
+	int ret;
+
+	memset(&z, 0, sizeof(z));
+	inflateInit(&z);
+
+	z.next_in = buf;
+	z.avail_in = len;
+	z.next_out = out;
+	z.avail_out = sizeof(out);
+
+	ret = inflate(&z, 0);
+	inflateEnd(&z);
+	return ret >= 0;
+}
+
+/* eye candy */
+static int counter = 0;
+static void progress(int sig)
+{
+	fprintf(stderr, "\r%d", counter);
+	alarm(1);
+}
+
+int main(void)
+{
+	/* oversized so we can read the whole buffer in */
+	unsigned char buf[1024*1024];
+	int len;
+	unsigned i, j;
+
+	signal(SIGALRM, progress);
+	alarm(1);
+
+	len = read(0, buf, sizeof(buf));
+	for (i = 0; i < len; i++) {
+		unsigned char c = buf[i];
+		for (j = 0; j <= 0xff; j++) {
+			buf[i] = j;
+
+			counter++;
+			if (try_zlib(buf, len))
+				printf("i=%d, j=%x\n", i, j);
+		}
+		buf[i] = c;
+	}
+
+	alarm(0);
+	fprintf(stderr, "\n");
+	return 0;
+}
+----------------------------------------------
+
+I compiled and ran with:
+
+-------
+  gcc -Wall -Werror -O3 munge.c -o munge -lz
+  ./munge <zlib
+-------
+
+
+There were a few false positives early on (if you write "no data" in the
+zlib header, zlib thinks it's just fine :) ). But I got a hit about
+halfway through:
+
+-------
+  i=5642, j=c7
+-------
+
+I let it run to completion, and got a few more hits at the end (where it
+was munging the crc to match our broken data). So there was a good
+chance this middle hit was the source of the problem.
+
+I confirmed by tweaking the byte in a hex editor, zlib inflating the
+result (no errors!), and then piping the output into "git hash-object",
+which reported the sha1 of the broken object. Success!
+
+I fixed the packfile itself with:
+
+-------
+  chmod +w $pack
+  printf '\xc7' | dd of=$pack bs=1 seek=51659518 conv=notrunc
+  chmod -w $pack
+-------
+
+The `\xc7` comes from the replacement byte our "munge" program found.
+The offset 51659518 is derived by taking the original object offset
+(51653873), adding the replacement offset found by "munge" (5642), and
+then adding back in the 3 bytes of git header we stripped.
+
+After that, "git fsck" ran clean.
+
+As for the corruption itself, I was lucky that it was indeed a single
+byte. In fact, it turned out to be a single bit. The byte 0xc7 was
+corrupted to 0xc5. So presumably it was caused by faulty hardware, or a
+cosmic ray.
+
+And the aborted attempt to look at the inflated output to see what was
+wrong? I could have looked forever and never found it. Here's the diff
+between what the corrupted data inflates to, versus the real data:
+
+--------------
+  -       cp = strtok (arg, "+");
+  +       cp = strtok (arg, ".");
+--------------
+
+It tweaked one byte and still ended up as valid, readable C that just
+happened to do something totally different! One takeaway is that on a
+less unlucky day, looking at the zlib output might have actually been
+helpful, as most random changes would actually break the C code.
+
+But more importantly, git's hashing and checksumming noticed a problem
+that easily could have gone undetected in another system. The result
+still compiled, but would have caused an interesting bug (that would
+have been blamed on some random commit).
-- 
1.8.4.1.898.g8bf8a41.dirty

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-16 15:41 ` Martin Fick
  2013-10-17  0:35   ` Jeff King
@ 2013-10-17  1:06   ` Duy Nguyen
  1 sibling, 0 replies; 16+ messages in thread
From: Duy Nguyen @ 2013-10-17  1:06 UTC (permalink / raw)
  To: Martin Fick; +Cc: Jeff King, Git Mailing List

On Wed, Oct 16, 2013 at 10:41 PM, Martin Fick <mfick@codeaurora.org> wrote:
>> and then running "git index-pack tmp.pack" in the
>> debugger (stop at unpack_raw_entry). Doing this, I found
>> that there were 3 bytes of header (and the header itself
>> had a sane type and size). So I stripped those off with:
>>
>>   dd if=object of=zlib bs=1 skip=3
>
> This too feels like something we should be able to do with a
> plumbing command eventually?
>
> git zlib-extract

Not an official plumbing, but I faced similar problems with pack v4. I
needed to verify that the output is correct and low level decoding
like this is generally a good thing to start with. So I wrote
test-dump [1] that can take an offset, a format and try to decode it.
It does not support zlib inflation yet, but adding one should be easy.
And because this is just a test program we don't really need to think
hard before adding something.

[1] http://article.gmane.org/gmane.comp.version-control.git/235388
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-16  8:34 pack corruption post-mortem Jeff King
  2013-10-16  8:59 ` Duy Nguyen
  2013-10-16 15:41 ` Martin Fick
@ 2013-10-19 10:32 ` Duy Nguyen
  2013-10-19 14:41   ` Nicolas Pitre
  2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King
  3 siblings, 1 reply; 16+ messages in thread
From: Duy Nguyen @ 2013-10-19 10:32 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Git Mailing List, Jeff King

On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote:
> I was recently presented with a repository with a corrupted packfile,
> and was asked if the data was recoverable. This post-mortem describes
> the steps I took to investigate and fix the problem. I thought others
> might find the process interesting, and it might help somebody in the
> same situation.
>
> I started with an fsck, which found a problem with exactly one object
> (I've used $pack and $obj below to keep the output readable, and also
> because I'll refer to them later):
>
>     $ git fsck
>     error: $pack SHA1 checksum mismatch
>     error: index CRC mismatch for object $obj from $pack at offset 51653873
>     error: inflate: data stream error (incorrect data check)
>     error: cannot unpack $obj from $pack at offset 51653873

I wonder if we should protect the sha-1 and pathname tables in packv4
with CRC too. A bit flipped in there could cause stream of corrupt
objects and make it hard to pinpoint the corrupt location..
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-19 10:32 ` Duy Nguyen
@ 2013-10-19 14:41   ` Nicolas Pitre
  2013-10-19 19:17     ` Shawn Pearce
  2013-10-20  4:44     ` Duy Nguyen
  0 siblings, 2 replies; 16+ messages in thread
From: Nicolas Pitre @ 2013-10-19 14:41 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

On Sat, 19 Oct 2013, Duy Nguyen wrote:

> On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote:
> > I was recently presented with a repository with a corrupted packfile,
> > and was asked if the data was recoverable. This post-mortem describes
> > the steps I took to investigate and fix the problem. I thought others
> > might find the process interesting, and it might help somebody in the
> > same situation.
> >
> > I started with an fsck, which found a problem with exactly one object
> > (I've used $pack and $obj below to keep the output readable, and also
> > because I'll refer to them later):
> >
> >     $ git fsck
> >     error: $pack SHA1 checksum mismatch
> >     error: index CRC mismatch for object $obj from $pack at offset 51653873
> >     error: inflate: data stream error (incorrect data check)
> >     error: cannot unpack $obj from $pack at offset 51653873
> 
> I wonder if we should protect the sha-1 and pathname tables in packv4
> with CRC too. A bit flipped in there could cause stream of corrupt
> objects and make it hard to pinpoint the corrupt location..

It turns out that we already have this covered.

The SHA1 used in the name of the pack file is actually the SHA1 checksum 
of the SHA1 table.

The path and ident tables are already protected by the CRC32 in the zlib 
deflated stream.

Normal objects are also zlib deflated (except for their header) but you 
need to inflate them in order to have this CRC verified, which the pack 
data copy tries to avoid.  Hence the separate CRC32 in the index file in 
that case.

However the pack v4 tables are very unlikely to be reused as is from one 
pack to another.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-19 14:41   ` Nicolas Pitre
@ 2013-10-19 19:17     ` Shawn Pearce
  2013-10-20 20:56       ` Nicolas Pitre
  2013-10-20  4:44     ` Duy Nguyen
  1 sibling, 1 reply; 16+ messages in thread
From: Shawn Pearce @ 2013-10-19 19:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Duy Nguyen, Git Mailing List, Jeff King

On Sat, Oct 19, 2013 at 7:41 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Sat, 19 Oct 2013, Duy Nguyen wrote:
>
>> On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote:
>> > I was recently presented with a repository with a corrupted packfile,
>> > and was asked if the data was recoverable. This post-mortem describes
>> > the steps I took to investigate and fix the problem. I thought others
>> > might find the process interesting, and it might help somebody in the
>> > same situation.
>> >
>> > I started with an fsck, which found a problem with exactly one object
>> > (I've used $pack and $obj below to keep the output readable, and also
>> > because I'll refer to them later):
>> >
>> >     $ git fsck
>> >     error: $pack SHA1 checksum mismatch
>> >     error: index CRC mismatch for object $obj from $pack at offset 51653873
>> >     error: inflate: data stream error (incorrect data check)
>> >     error: cannot unpack $obj from $pack at offset 51653873
>>
>> I wonder if we should protect the sha-1 and pathname tables in packv4
>> with CRC too. A bit flipped in there could cause stream of corrupt
>> objects and make it hard to pinpoint the corrupt location..
>
> It turns out that we already have this covered.
>
> The SHA1 used in the name of the pack file is actually the SHA1 checksum
> of the SHA1 table.

I continue to believe this naming is wrong. The pack file name should
be the SHA1 checksum of the pack data stream, but the SHA1 table. This
would allow cleaner update of a repository that was repacked with
different compression settings, but identical objects.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-19 19:17     ` Shawn Pearce
@ 2013-10-20 20:56       ` Nicolas Pitre
  0 siblings, 0 replies; 16+ messages in thread
From: Nicolas Pitre @ 2013-10-20 20:56 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Duy Nguyen, Git Mailing List, Jeff King

On Sat, 19 Oct 2013, Shawn Pearce wrote:

> On Sat, Oct 19, 2013 at 7:41 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Sat, 19 Oct 2013, Duy Nguyen wrote:
> >
> >> On Wed, Oct 16, 2013 at 3:34 PM, Jeff King <peff@peff.net> wrote:
> >> > I was recently presented with a repository with a corrupted packfile,
> >> > and was asked if the data was recoverable. This post-mortem describes
> >> > the steps I took to investigate and fix the problem. I thought others
> >> > might find the process interesting, and it might help somebody in the
> >> > same situation.
> >> >
> >> > I started with an fsck, which found a problem with exactly one object
> >> > (I've used $pack and $obj below to keep the output readable, and also
> >> > because I'll refer to them later):
> >> >
> >> >     $ git fsck
> >> >     error: $pack SHA1 checksum mismatch
> >> >     error: index CRC mismatch for object $obj from $pack at offset 51653873
> >> >     error: inflate: data stream error (incorrect data check)
> >> >     error: cannot unpack $obj from $pack at offset 51653873
> >>
> >> I wonder if we should protect the sha-1 and pathname tables in packv4
> >> with CRC too. A bit flipped in there could cause stream of corrupt
> >> objects and make it hard to pinpoint the corrupt location..
> >
> > It turns out that we already have this covered.
> >
> > The SHA1 used in the name of the pack file is actually the SHA1 checksum
> > of the SHA1 table.
> 
> I continue to believe this naming is wrong. The pack file name should
> be the SHA1 checksum of the pack data stream, but the SHA1 table. This
> would allow cleaner update of a repository that was repacked with
> different compression settings, but identical objects.

OK, after some thoughts, I decided that it is best _not_ to rely on the 
pack name.  The pack name currently carries no meaning what so ever and 
git works just fine if some packs are arbitrarily named.  Your concern 
about its current naming scheme is certainly legitimate, and I don't 
want pack v4 to introduce any kind of restrictions here.

Furthermore, the SHA1 table is the only pack element which integrity may 
not independently be verified at the moment.  This makes corruption 
isolation much harder when receiving a streamed pack.

Therefore I decided to introduce a small pack v4 format change with the 
following patch:

----- >8
From: Nicolas Pitre <nico@fluxnic.net>
Date: Sun, 20 Oct 2013 14:52:24 -0400
Subject: [PATCH] pack v4: add a SHA1 checksum to the SHA1 table

Every packed element currently has some integrity protection coming from
the CRC32 embedded in the zlib deflated stream, except for the SHA1 table.

Some bit flip corruption in the SHA1 table may have repercutions on a
whole lot of objects and could be very hard to isolate.

Let's add some integrity protection on the SHA1 table by terminating it
with an additional SHA1 entry being the SHA1 checksum of the table.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index fd2e737..a370f26 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -94,7 +94,9 @@ Git pack format
 === Pack v4 tables
 
  - A table of sorted SHA-1 object names for all objects contained in
-   the on-disk pack.
+   the on-disk pack, with a final entry being the SHA1 sum of all the
+   previous entries.  The size of this table is therefore
+   (nr_objects + 1) * 20 bytes.
 
    The SHA-1 table in thin packs must include the omitted objects as well.
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index caec388..01300d6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1471,7 +1471,10 @@ static struct packv4_dict *read_dict(void)
 
 static void parse_dictionaries(void)
 {
+	git_SHA_CTX ctx;
+	unsigned char table_sha1[20];
 	int i;
+
 	if (!packv4)
 		return;
 
@@ -1485,6 +1488,12 @@ static void parse_dictionaries(void)
 			die(_("wrong order in SHA-1 table at entry %d"), i);
 	}
 
+	git_SHA1_Init(&ctx);
+	git_SHA1_Update(&ctx, sha1_table, 20 * nr_objects_final);
+	git_SHA1_Final(table_sha1, &ctx);
+	if (hashcmp(table_sha1, fill_and_use(20)) != 0)
+		die(_("SHA-1 table checksum mismatch"));
+
 	name_dict = read_dict();
 	path_dict = read_dict();
 }
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 9fd5640..eb57ada 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -816,11 +816,18 @@ static void unpack_all(void)
 	use(sizeof(struct pack_header));
 
 	if (packv4) {
+		git_SHA_CTX ctx;
+		unsigned char table_sha1[20];
 		sha1_table = xmalloc(20 * nr_objects);
 		for (i = 0; i < nr_objects; i++) {
 			unsigned char *p = sha1_table + i * 20;
 			hashcpy(p, fill_and_use(20));
 		}
+		git_SHA1_Init(&ctx);
+		git_SHA1_Update(&ctx, sha1_table, 20 * nr_objects);
+		git_SHA1_Final(table_sha1, &ctx);
+		if (hashcmp(table_sha1, fill_and_use(20)) != 0)
+			die("SHA-1 table checksum mismatch");
 		name_dict = read_dict();
 		path_dict = read_dict();
 	}
diff --git a/packv4-create.c b/packv4-create.c
index 14be867..7b51792 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -688,13 +688,20 @@ unsigned long packv4_write_tables(struct sha1file *f,
 	struct pack_idx_entry *objs = v4->all_objs;
 	struct dict_table *commit_ident_table = v4->commit_ident_table;
 	struct dict_table *tree_path_table = v4->tree_path_table;
+	git_SHA_CTX ctx;
+	unsigned char table_sha1[20];
 	unsigned i;
 	unsigned long written = 0;
 
 	/* The sorted list of object SHA1's is always first */
-	for (i = 0; i < nr_objects; i++)
+	git_SHA1_Init(&ctx);
+	for (i = 0; i < nr_objects; i++) {
+		git_SHA1_Update(&ctx, objs[i].sha1, 20);
 		sha1write(f, objs[i].sha1, 20);
-	written = 20 * nr_objects;
+	}
+	git_SHA1_Final(table_sha1, &ctx);
+	sha1write(f, table_sha1, 20);
+	written = 20 * (nr_objects + 1);
 
 	/* Then the commit dictionary table */
 	written += write_dict_table(f, commit_ident_table,
diff --git a/packv4-parse.c b/packv4-parse.c
index 31c89c7..e6f5028 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -128,7 +128,7 @@ static struct packv4_dict *load_dict(struct packed_git *p, off_t *offset)
 
 static void load_ident_dict(struct packed_git *p)
 {
-	off_t offset = 12 + p->num_objects * 20;
+	off_t offset = 12 + (p->num_objects + 1) * 20;
 	struct packv4_dict *names = load_dict(p, &offset);
 	if (!names)
 		die("bad pack name dictionary in %s", p->pack_name);


Nicolas

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-19 14:41   ` Nicolas Pitre
  2013-10-19 19:17     ` Shawn Pearce
@ 2013-10-20  4:44     ` Duy Nguyen
  2013-10-20 21:08       ` Nicolas Pitre
  1 sibling, 1 reply; 16+ messages in thread
From: Duy Nguyen @ 2013-10-20  4:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Git Mailing List, Jeff King

On Sat, Oct 19, 2013 at 9:41 PM, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Sat, 19 Oct 2013, Duy Nguyen wrote:
> The SHA1 used in the name of the pack file is actually the SHA1 checksum
> of the SHA1 table.
>
> The path and ident tables are already protected by the CRC32 in the zlib
> deflated stream.
>
> Normal objects are also zlib deflated (except for their header) but you
> need to inflate them in order to have this CRC verified, which the pack
> data copy tries to avoid.  Hence the separate CRC32 in the index file in
> that case.

OK slight change in the subject, what about reading code (i.e.
sha1_file.c)? With v2 crc32 is verified by object inflate code. With
v4 trees or commits, because we store some (or all) data outside of
the deflated stream, we will not benefit from crc32 verifcation
previously done for all trees and commits. Should we perform explict
crc32 check when reading v4 trees and commits (and maybe verify the
sha-1 table too)?
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pack corruption post-mortem
  2013-10-20  4:44     ` Duy Nguyen
@ 2013-10-20 21:08       ` Nicolas Pitre
  0 siblings, 0 replies; 16+ messages in thread
From: Nicolas Pitre @ 2013-10-20 21:08 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

On Sun, 20 Oct 2013, Duy Nguyen wrote:

> On Sat, Oct 19, 2013 at 9:41 PM, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Sat, 19 Oct 2013, Duy Nguyen wrote:
> > The SHA1 used in the name of the pack file is actually the SHA1 checksum
> > of the SHA1 table.
> >
> > The path and ident tables are already protected by the CRC32 in the zlib
> > deflated stream.
> >
> > Normal objects are also zlib deflated (except for their header) but you
> > need to inflate them in order to have this CRC verified, which the pack
> > data copy tries to avoid.  Hence the separate CRC32 in the index file in
> > that case.
> 
> OK slight change in the subject, what about reading code (i.e.
> sha1_file.c)? With v2 crc32 is verified by object inflate code. With
> v4 trees or commits, because we store some (or all) data outside of
> the deflated stream, we will not benefit from crc32 verifcation
> previously done for all trees and commits. Should we perform explict
> crc32 check when reading v4 trees and commits (and maybe verify the
> sha-1 table too)?

I suppose that we should... at some point.

I did the SHA1 table check only for index-pack and unpack-objects in my 
latest patch.  Adding it to check_packed_git_idx() as well should be 
trivial.

I'm not sure about the best way to do systematic checks on tree objects 
though.  We have both the CRC32 recorded in the index file and the 
object SHA1 recorded in the SHA1 table.  But any of them needs to be 
computed as we walk the object and we currently havn't found the best 
way to do that yet.  So I'd suggest postponing this until the tree walk 
is properly implemented to perform well first.

Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] howto: document more tools for recovery corruption
  2013-10-16  8:34 pack corruption post-mortem Jeff King
                   ` (2 preceding siblings ...)
  2013-10-19 10:32 ` Duy Nguyen
@ 2015-04-01 21:08 ` Jeff King
  2015-04-01 22:21   ` Junio C Hamano
  3 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2015-04-01 21:08 UTC (permalink / raw)
  To: git

Long ago, I documented a corruption recovery I did and gave
some C code that I used to help find a flipped bit.  I had
to fix a similar case recently, and I ended up writing a few
more tools.  I hope nobody ever has to use these, but it
does not hurt to share them, just in case.

Signed-off-by: Jeff King <peff@peff.net>
---
 .../howto/recover-corrupted-object-harder.txt      | 237 +++++++++++++++++++++
 1 file changed, 237 insertions(+)

diff --git a/Documentation/howto/recover-corrupted-object-harder.txt b/Documentation/howto/recover-corrupted-object-harder.txt
index 23e685d..9c4cd09 100644
--- a/Documentation/howto/recover-corrupted-object-harder.txt
+++ b/Documentation/howto/recover-corrupted-object-harder.txt
@@ -240,3 +240,240 @@ But more importantly, git's hashing and checksumming noticed a problem
 that easily could have gone undetected in another system. The result
 still compiled, but would have caused an interesting bug (that would
 have been blamed on some random commit).
+
+
+The adventure continues...
+--------------------------
+
+I ended up doing this again! Same entity, new hardware. The assumption
+at this point is that the old disk corrupted the packfile, and then the
+corruption was migrated to the new hardware (because it was done by
+rsync or similar, and no fsck was done at the time of migration).
+
+This time, the affected blob was over 20 megabytes, which was far too
+large to do a brute-force on. I followed the instructions above to
+create the `zlib` file. I then used the `inflate` program below to pull
+the corrupted data from that. Examining that output gave me a hint about
+where in the file the corruption was. But now I was working with the
+file itself, not the zlib contents. So knowing the sha1 of the object
+and the approximate area of the corruption, I used the `sha1-munge`
+program below to brute-force the correct byte.
+
+Here's the inflate program (it's essentially `gunzip` but without the
+`.gz` header processing):
+
+--------------------------
+#include <stdio.h>
+#include <string.h>
+#include <zlib.h>
+#include <stdlib.h>
+
+int main(int argc, char **argv)
+{
+	/*
+	 * oversized so we can read the whole buffer in;
+	 * this could actually be switched to streaming
+	 * to avoid any memory limitations
+	 */
+	static unsigned char buf[25 * 1024 * 1024];
+	static unsigned char out[25 * 1024 * 1024];
+	int len;
+	z_stream z;
+	int ret;
+
+	len = read(0, buf, sizeof(buf));
+	memset(&z, 0, sizeof(z));
+	inflateInit(&z);
+
+	z.next_in = buf;
+	z.avail_in = len;
+	z.next_out = out;
+	z.avail_out = sizeof(out);
+
+	ret = inflate(&z, 0);
+	if (ret != Z_OK && ret != Z_STREAM_END)
+		fprintf(stderr, "initial inflate failed (%d)\n", ret);
+
+	fprintf(stderr, "outputting %lu bytes", z.total_out);
+	fwrite(out, 1, z.total_out, stdout);
+	return 0;
+}
+--------------------------
+
+And here is the `sha1-munge` program:
+
+--------------------------
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <signal.h>
+#include <openssl/sha.h>
+#include <stdlib.h>
+
+/* eye candy */
+static int counter = 0;
+static void progress(int sig)
+{
+	fprintf(stderr, "\r%d", counter);
+	alarm(1);
+}
+
+static const signed char hexval_table[256] = {
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 00-07 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 08-0f */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 10-17 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 18-1f */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 20-27 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 28-2f */
+	  0,  1,  2,  3,  4,  5,  6,  7,		/* 30-37 */
+	  8,  9, -1, -1, -1, -1, -1, -1,		/* 38-3f */
+	 -1, 10, 11, 12, 13, 14, 15, -1,		/* 40-47 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 48-4f */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 50-57 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 58-5f */
+	 -1, 10, 11, 12, 13, 14, 15, -1,		/* 60-67 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 68-67 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 70-77 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 78-7f */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 80-87 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 88-8f */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 90-97 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 98-9f */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* a0-a7 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* a8-af */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* b0-b7 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* b8-bf */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* c0-c7 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* c8-cf */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* d0-d7 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* d8-df */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* e0-e7 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* e8-ef */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* f0-f7 */
+	 -1, -1, -1, -1, -1, -1, -1, -1,		/* f8-ff */
+};
+
+static inline unsigned int hexval(unsigned char c)
+{
+return hexval_table[c];
+}
+
+static int get_sha1_hex(const char *hex, unsigned char *sha1)
+{
+	int i;
+	for (i = 0; i < 20; i++) {
+		unsigned int val;
+		/*
+		 * hex[1]=='\0' is caught when val is checked below,
+		 * but if hex[0] is NUL we have to avoid reading
+		 * past the end of the string:
+		 */
+		if (!hex[0])
+			return -1;
+		val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+		if (val & ~0xff)
+			return -1;
+		*sha1++ = val;
+		hex += 2;
+	}
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	/* oversized so we can read the whole buffer in */
+	static unsigned char buf[25 * 1024 * 1024];
+	char header[32];
+	int header_len;
+	unsigned char have[20], want[20];
+	int start, len;
+	SHA_CTX orig;
+	unsigned i, j;
+
+	if (!argv[1] || get_sha1_hex(argv[1], want)) {
+		fprintf(stderr, "usage: sha1-munge <sha1> [start] <file.in\n");
+		return 1;
+	}
+
+	if (argv[2])
+		start = atoi(argv[2]);
+	else
+		start = 0;
+
+	len = read(0, buf, sizeof(buf));
+	header_len = sprintf(header, "blob %d", len) + 1;
+	fprintf(stderr, "using header: %s\n", header);
+
+	/*
+	 * We keep a running sha1 so that if you are munging
+	 * near the end of the file, we do not have to re-sha1
+	 * the unchanged earlier bytes
+	 */
+	SHA1_Init(&orig);
+	SHA1_Update(&orig, header, header_len);
+	if (start)
+		SHA1_Update(&orig, buf, start);
+
+	signal(SIGALRM, progress);
+	alarm(1);
+
+	for (i = start; i < len; i++) {
+		unsigned char c;
+		SHA_CTX x;
+
+#if 0
+		/*
+		 * deletion -- this would not actually work in practice,
+		 * I think, because we've already committed to a
+		 * particular size in the header. Ditto for addition
+		 * below. In those cases, you'd have to do the whole
+		 * sha1 from scratch, or possibly keep three running
+		 * "orig" sha1 computations going.
+		 */
+		memcpy(&x, &orig, sizeof(x));
+		SHA1_Update(&x, buf + i + 1, len - i - 1);
+		SHA1_Final(have, &x);
+		if (!memcmp(have, want, 20))
+			printf("i=%d, deletion\n", i);
+#endif
+
+		/*
+		 * replacement -- note that this tries each of the 256
+		 * possible bytes. If you suspect a single-bit flip,
+		 * it would be much shorter to just try the 8
+		 * bit-flipped variants.
+		 */
+		c = buf[i];
+		for (j = 0; j <= 0xff; j++) {
+			buf[i] = j;
+
+			memcpy(&x, &orig, sizeof(x));
+			SHA1_Update(&x, buf + i, len - i);
+			SHA1_Final(have, &x);
+			if (!memcmp(have, want, 20))
+				printf("i=%d, j=%02x\n", i, j);
+		}
+		buf[i] = c;
+
+#if 0
+		/* addition */
+		for (j = 0; j <= 0xff; j++) {
+			unsigned char extra = j;
+			memcpy(&x, &orig, sizeof(x));
+			SHA1_Update(&x, &extra, 1);
+			SHA1_Update(&x, buf + i, len - i);
+			SHA1_Final(have, &x);
+			if (!memcmp(have, want, 20))
+				printf("i=%d, addition=%02x", i, j);
+		}
+#endif
+
+		SHA1_Update(&orig, buf + i, 1);
+		counter++;
+	}
+
+	alarm(0);
+	fprintf(stderr, "\r%d\n", counter);
+	return 0;
+}
+--------------------------
-- 
2.4.0.rc0.363.gf9f328b

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] howto: document more tools for recovery corruption
  2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King
@ 2015-04-01 22:21   ` Junio C Hamano
  2015-04-02  0:49     ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2015-04-01 22:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Long ago, I documented a corruption recovery I did and gave
> some C code that I used to help find a flipped bit.  I had
> to fix a similar case recently, and I ended up writing a few
> more tools.  I hope nobody ever has to use these, but it
> does not hurt to share them, just in case.

I am having a hard time deciding if I should take the Date: header
of the patch e-mail into consideration.  The munge thing looks
serious enough, though.


> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  .../howto/recover-corrupted-object-harder.txt      | 237 +++++++++++++++++++++
>  1 file changed, 237 insertions(+)
>
> diff --git a/Documentation/howto/recover-corrupted-object-harder.txt
> b/Documentation/howto/recover-corrupted-object-harder.txt
> index 23e685d..9c4cd09 100644
> --- a/Documentation/howto/recover-corrupted-object-harder.txt
> +++ b/Documentation/howto/recover-corrupted-object-harder.txt
> @@ -240,3 +240,240 @@ But more importantly, git's hashing and checksumming noticed a problem
>  that easily could have gone undetected in another system. The result
>  still compiled, but would have caused an interesting bug (that would
>  have been blamed on some random commit).
> +
> +
> +The adventure continues...
> +--------------------------
> +
> +I ended up doing this again! Same entity, new hardware. The assumption
> +at this point is that the old disk corrupted the packfile, and then the
> +corruption was migrated to the new hardware (because it was done by
> +rsync or similar, and no fsck was done at the time of migration).
> +
> +This time, the affected blob was over 20 megabytes, which was far too
> +large to do a brute-force on. I followed the instructions above to
> +create the `zlib` file. I then used the `inflate` program below to pull
> +the corrupted data from that. Examining that output gave me a hint about
> +where in the file the corruption was. But now I was working with the
> +file itself, not the zlib contents. So knowing the sha1 of the object
> +and the approximate area of the corruption, I used the `sha1-munge`
> +program below to brute-force the correct byte.
> +
> +Here's the inflate program (it's essentially `gunzip` but without the
> +`.gz` header processing):
> +
> +--------------------------
> +#include <stdio.h>
> +#include <string.h>
> +#include <zlib.h>
> +#include <stdlib.h>
> +
> +int main(int argc, char **argv)
> +{
> +	/*
> +	 * oversized so we can read the whole buffer in;
> +	 * this could actually be switched to streaming
> +	 * to avoid any memory limitations
> +	 */
> +	static unsigned char buf[25 * 1024 * 1024];
> +	static unsigned char out[25 * 1024 * 1024];
> +	int len;
> +	z_stream z;
> +	int ret;
> +
> +	len = read(0, buf, sizeof(buf));
> +	memset(&z, 0, sizeof(z));
> +	inflateInit(&z);
> +
> +	z.next_in = buf;
> +	z.avail_in = len;
> +	z.next_out = out;
> +	z.avail_out = sizeof(out);
> +
> +	ret = inflate(&z, 0);
> +	if (ret != Z_OK && ret != Z_STREAM_END)
> +		fprintf(stderr, "initial inflate failed (%d)\n", ret);
> +
> +	fprintf(stderr, "outputting %lu bytes", z.total_out);
> +	fwrite(out, 1, z.total_out, stdout);
> +	return 0;
> +}
> +--------------------------
> +
> +And here is the `sha1-munge` program:
> +
> +--------------------------
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <string.h>
> +#include <signal.h>
> +#include <openssl/sha.h>
> +#include <stdlib.h>
> +
> +/* eye candy */
> +static int counter = 0;
> +static void progress(int sig)
> +{
> +	fprintf(stderr, "\r%d", counter);
> +	alarm(1);
> +}
> +
> +static const signed char hexval_table[256] = {
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 00-07 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 08-0f */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 10-17 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 18-1f */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 20-27 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 28-2f */
> +	  0,  1,  2,  3,  4,  5,  6,  7,		/* 30-37 */
> +	  8,  9, -1, -1, -1, -1, -1, -1,		/* 38-3f */
> +	 -1, 10, 11, 12, 13, 14, 15, -1,		/* 40-47 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 48-4f */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 50-57 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 58-5f */
> +	 -1, 10, 11, 12, 13, 14, 15, -1,		/* 60-67 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 68-67 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 70-77 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 78-7f */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 80-87 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 88-8f */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 90-97 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 98-9f */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* a0-a7 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* a8-af */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* b0-b7 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* b8-bf */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* c0-c7 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* c8-cf */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* d0-d7 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* d8-df */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* e0-e7 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* e8-ef */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* f0-f7 */
> +	 -1, -1, -1, -1, -1, -1, -1, -1,		/* f8-ff */
> +};
> +
> +static inline unsigned int hexval(unsigned char c)
> +{
> +return hexval_table[c];
> +}
> +
> +static int get_sha1_hex(const char *hex, unsigned char *sha1)
> +{
> +	int i;
> +	for (i = 0; i < 20; i++) {
> +		unsigned int val;
> +		/*
> +		 * hex[1]=='\0' is caught when val is checked below,
> +		 * but if hex[0] is NUL we have to avoid reading
> +		 * past the end of the string:
> +		 */
> +		if (!hex[0])
> +			return -1;
> +		val = (hexval(hex[0]) << 4) | hexval(hex[1]);
> +		if (val & ~0xff)
> +			return -1;
> +		*sha1++ = val;
> +		hex += 2;
> +	}
> +	return 0;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	/* oversized so we can read the whole buffer in */
> +	static unsigned char buf[25 * 1024 * 1024];
> +	char header[32];
> +	int header_len;
> +	unsigned char have[20], want[20];
> +	int start, len;
> +	SHA_CTX orig;
> +	unsigned i, j;
> +
> +	if (!argv[1] || get_sha1_hex(argv[1], want)) {
> +		fprintf(stderr, "usage: sha1-munge <sha1> [start] <file.in\n");
> +		return 1;
> +	}
> +
> +	if (argv[2])
> +		start = atoi(argv[2]);
> +	else
> +		start = 0;
> +
> +	len = read(0, buf, sizeof(buf));
> +	header_len = sprintf(header, "blob %d", len) + 1;
> +	fprintf(stderr, "using header: %s\n", header);
> +
> +	/*
> +	 * We keep a running sha1 so that if you are munging
> +	 * near the end of the file, we do not have to re-sha1
> +	 * the unchanged earlier bytes
> +	 */
> +	SHA1_Init(&orig);
> +	SHA1_Update(&orig, header, header_len);
> +	if (start)
> +		SHA1_Update(&orig, buf, start);
> +
> +	signal(SIGALRM, progress);
> +	alarm(1);
> +
> +	for (i = start; i < len; i++) {
> +		unsigned char c;
> +		SHA_CTX x;
> +
> +#if 0
> +		/*
> +		 * deletion -- this would not actually work in practice,
> +		 * I think, because we've already committed to a
> +		 * particular size in the header. Ditto for addition
> +		 * below. In those cases, you'd have to do the whole
> +		 * sha1 from scratch, or possibly keep three running
> +		 * "orig" sha1 computations going.
> +		 */
> +		memcpy(&x, &orig, sizeof(x));
> +		SHA1_Update(&x, buf + i + 1, len - i - 1);
> +		SHA1_Final(have, &x);
> +		if (!memcmp(have, want, 20))
> +			printf("i=%d, deletion\n", i);
> +#endif
> +
> +		/*
> +		 * replacement -- note that this tries each of the 256
> +		 * possible bytes. If you suspect a single-bit flip,
> +		 * it would be much shorter to just try the 8
> +		 * bit-flipped variants.
> +		 */
> +		c = buf[i];
> +		for (j = 0; j <= 0xff; j++) {
> +			buf[i] = j;
> +
> +			memcpy(&x, &orig, sizeof(x));
> +			SHA1_Update(&x, buf + i, len - i);
> +			SHA1_Final(have, &x);
> +			if (!memcmp(have, want, 20))
> +				printf("i=%d, j=%02x\n", i, j);
> +		}
> +		buf[i] = c;
> +
> +#if 0
> +		/* addition */
> +		for (j = 0; j <= 0xff; j++) {
> +			unsigned char extra = j;
> +			memcpy(&x, &orig, sizeof(x));
> +			SHA1_Update(&x, &extra, 1);
> +			SHA1_Update(&x, buf + i, len - i);
> +			SHA1_Final(have, &x);
> +			if (!memcmp(have, want, 20))
> +				printf("i=%d, addition=%02x", i, j);
> +		}
> +#endif
> +
> +		SHA1_Update(&orig, buf + i, 1);
> +		counter++;
> +	}
> +
> +	alarm(0);
> +	fprintf(stderr, "\r%d\n", counter);
> +	return 0;
> +}
> +--------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] howto: document more tools for recovery corruption
  2015-04-01 22:21   ` Junio C Hamano
@ 2015-04-02  0:49     ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2015-04-02  0:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Apr 01, 2015 at 03:21:16PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Long ago, I documented a corruption recovery I did and gave
> > some C code that I used to help find a flipped bit.  I had
> > to fix a similar case recently, and I ended up writing a few
> > more tools.  I hope nobody ever has to use these, but it
> > does not hurt to share them, just in case.
> 
> I am having a hard time deciding if I should take the Date: header
> of the patch e-mail into consideration.  The munge thing looks
> serious enough, though.

Heh, no, this is sadly a serious thing that I did today (but I was able
to detect and correct a single flipped bit in a 60MB packfile, which is
kind of neat, I guess).

I hesitated sending them at all because they are not really note-worthy.
OTOH, during today's exercise I found the instructions and sample
program I had written last time to be very useful, so perhaps it can
help somebody (or even me) at some later today.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-04-02  0:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-16  8:34 pack corruption post-mortem Jeff King
2013-10-16  8:59 ` Duy Nguyen
2013-10-16 15:41 ` Martin Fick
2013-10-17  0:35   ` Jeff King
2013-10-17 15:47     ` Junio C Hamano
2013-10-25  7:55       ` Jeff King
2013-10-17  1:06   ` Duy Nguyen
2013-10-19 10:32 ` Duy Nguyen
2013-10-19 14:41   ` Nicolas Pitre
2013-10-19 19:17     ` Shawn Pearce
2013-10-20 20:56       ` Nicolas Pitre
2013-10-20  4:44     ` Duy Nguyen
2013-10-20 21:08       ` Nicolas Pitre
2015-04-01 21:08 ` [PATCH] howto: document more tools for recovery corruption Jeff King
2015-04-01 22:21   ` Junio C Hamano
2015-04-02  0:49     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).