git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Sergio Callegari <scallegari@arces.unibo.it>
Cc: git@vger.kernel.org
Subject: Re: Problematic git pack
Date: Thu, 31 Aug 2006 14:33:06 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0608311416060.27779@g5.osdl.org> (raw)
In-Reply-To: <44F6A198.4040902@arces.unibo.it>



On Thu, 31 Aug 2006, Sergio Callegari wrote:
>
> Junio, can you please send me privately details about [*1*] so I can retrieve
> the pack also?

He already did, search for "members.cox.net" in your email archive (it's 
Message-ID: <7v7j0qihwl.fsf@assigned-by-dhcp.cox.net> to be precise).

> I also have another question... (maybe it was answered in some previous thread
> on this list, in this case a pointer would be enough).
> Now I am going to have the fixed archive and also a new archive, which I
> restarted from the latest working copy I had of my project.
> Is there any way to automatically do real "surgery" to attach one to the other
> and get a single archive with all the history?

Yes. This is just what a "grafts" file is for.

Put the old pack/idx files into the .git/objects/packs directory, and then 
you can create "fake parenthood" information in a ".git/info/grafts" file 
by just adding text-lines of the format "<sha1> <fakeparentsha1>" (with 
each SHA being the regular 40-byte hex representation).

> Obviously, if I try to change a commit object to modify its parents, its
> signature changes, so I need to modify its childs and so on, is this correct?
> Alternatively I belive that grafts should be a way to go... I had never used
> them before, do all git tools support them? Particularly do they get pushed
> and pulled correctly?

Nope, they won't get pushed and pulled correctly, you need to put the 
grafts files in all repositories. Alternatively, you can re-create the 
whole history, I think cogito had some history re-writing tool.

> > So the _real_ difference is literally just the one byte at offset 0151000
> > (decimal 53760) which in the fixed pack is 0x96, and in the corrupt pack it
> > is 0x94. That's a single-bit difference (bit #1 has been cleared).
> 
> So, possibly, the alpha particle theory could be the plausible one in the
> end...

Yes. It's just that Junio's original theory required it to not just hit a 
memory cell, it also had to hit it at _just_ the right time in between 
being written and the SHA1 of the buffer being computed. So the original 
theory was very unlikely indeed.

My theory of the corruption just causing a re-computed SHA1 when repacking 
(and silently copying the corruption without realizing it) meant that 
there was no such small and unlikely window, but that any regular memory 
(or disk) corruption could easily have caused it at any time, and then a 
subsequent re-pack "fixed" the SHA1 to match the corruption..

> The bad thing is that I don't know which of my two machines (the laptop or the
> desktop) caused the issue!

I'd suggest running memtest86 for a few days on both (not necessarily at 
the same time - keep one working machine to do you job on ;)

> > Finally, this also points out that the corrupted packs _can_ be fixed, but I
> > think Sergio was a bit lucky (to offset all the bad luck). Sergio still had
> > access to the original file that had had its object corrupted. 
>
> Actually, this could possibly be a not so rare case... In my tree I had the
> development of some LaTeX documents and packages (code like, the really
> "precious" files) and a few binary objects (images and openoffice files
> mainly, by far less precious).

Sure. In your case you had checked in generated files too, and yes, they 
were the larger ones. That's not true in general - in many other projects, 
the _directory_ structure (ie the git "tree" objects) will be a large 
portion of the project, and probably more likely to be corrupt. Now, to 
some degree the tree objects are likely the ones easiest to "repair" 
(because you can try to look at the history and figure things out by 
hand), but at the same time, people also tend to have deeper delta-chains 
and it would just be _very_ painful.

So I do think you were somewhat lucky.

> Finally, having a command to create an object out of a single file (contrary
> of git cat-file) could help re-creating the missing objects...

Hmm. Like "git-hash-object"?

			Linus

  parent reply	other threads:[~2006-08-31 21:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-31  8:45 Problematic git pack Sergio Callegari
2006-08-31 11:15 ` Johannes Schindelin
2006-08-31 16:23 ` Nicolas Pitre
2006-08-31 21:33 ` Linus Torvalds [this message]
     [not found] <44F1D826.2010701@arces.unibo.it>
     [not found] ` <7v1wr1yjjz.fsf@assigned-by-dhcp.cox.net>
     [not found]   ` <44F4006C.1040908@arces.unibo.it>
     [not found]     ` <7vmz9nn90t.fsf@assigned-by-dhcp.cox.net>
     [not found]       ` <Pine.LNX.4.64.0608291007170.27779@g5.osdl.org>
     [not found]         ` <7vodu2iryg.fsf@assigned-by-dhcp.cox.net>
     [not found]           ` <44F5615F.7010809@arces.unibo.it>
     [not found]             ` <7v7j0qihwl.fsf@assigned-by-dhcp.cox.net>
2006-08-30 18:11               ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0608311416060.27779@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=scallegari@arces.unibo.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).