git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Johannes Sixt <j.sixt@viscovery.net>,
	Sergio Callegari <sergio.callegari@gmail.com>,
	git@vger.kernel.org
Subject: Re: git fsck not identifying corrupted packs
Date: Mon, 19 Oct 2009 12:03:42 -0700	[thread overview]
Message-ID: <7v7hur1a0h.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: alpine.DEB.1.00.0910191202020.4985@pacific.mpi-cbg.de

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Mon, 19 Oct 2009, Johannes Sixt wrote:
>
>> Sergio Callegari schrieb:
>> > Is there a means to have fsck to a truly full check on the sanity of a 
>> > repo?
>> 
>> git fsck --full
>> 
>> RTFM, please.
>
> Now, now.
>
> If you were to test a new filesystem, say, wonderfulfs, and wanted to 
> check its integrity, would you not just run "fsck-wonderfulfs" if that 
> exists, rather than reading the fantamagastic manual?  Would you not 
> expect that it Does The Right Thing?  Would you not expect that it 
> follows the Law Of Minimal Surprise?
>
> So FWIW I can see where Sergio is coming from.

Linus and other git developers from the early days trained their fingers
to type the command, every once in a while even without thinking, to check
the consistency of the repository back when the lower core part of the git
was still being developed.  Developers who wanted to make sure that git
correctly dealt with packfiles could deliberately trigger their creation
and checked them after they were created carefully, but loose objects are
the ones that are written by various commands from random codepaths.  It
made some technical sense to have a mode that checked only loose objects
from the debugging point of view for that reason.

    Side note.  I think the help description of --full option is wrong (or
    at least stale).  We always look at alternate object store these days
    since e15ef66 (fsck: check loose objects from alternate object stores
    by default, 2009-01-30).  It probably should read "check packed
    objects fully" or something.

The above paragraph is merely a historical background, and in this case
the "history" refers to early-to-mid 2005.  Even for git developers there
no longer is any reason to type "git fsck" in fear of some newly created
objects might be corrupt due to recent change to git these days.

The reason we did not make "--full" the default is probably we trust our
filesystems a bit too much.  At least, we trusted filesystems more than we
trusted the lower core part of git that was under development ;-)

Once a packfile is created and we always use it read-only, there didn't
seem to be much point in suspecting that the underlying filesystems or
disks may corrupt them in such a way that is not caught by the SHA-1
checksum over the entire packfile and per object checksum.  That trust in
the filesystems might have been a good tradeoff between fsck performance
and reliability on platforms git was initially developed on and for, but
it might not be true anymore as we run on more platforms these days.

It probably makes sense to ship 1.7.0 with a version of "fsck" in which
"--full" is the default; it would still accept "--full" but it would be a
no-op.  This would be a backward incompatible change, but the difference
is primarily about performance ("it takes a lot longer than before!"), and
not correctness, so we probably can live with it.  As I already said,
there is not much reason to run "fsck" every five minutes anymore to begin
with (unless your filesystem is so unreliable that it might eat one file
every five minutes, that is).

It probably is also a good idea to add a "--loose" option that does what
"fsck" currently does without "--full".  It is a good name because (1) to
people who do not know the internal of git, it means "check only loosely",
which would discourage them from running "fack" with that option to begin
with, and (2) to others, it exactly tells what the option makes the
command check.

  reply	other threads:[~2009-10-19 19:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-19  7:56 git fsck not identifying corrupted packs Sergio Callegari
2009-10-19  9:11 ` Johannes Sixt
2009-10-19 10:04   ` Johannes Schindelin
2009-10-19 19:03     ` Junio C Hamano [this message]
2009-10-19 19:27       ` Wesley J. Landaker
2009-10-20 15:41         ` Robin Rosenberg
2009-10-20 16:20           ` Wesley J. Landaker
2009-10-20  6:26       ` Matthieu Moy
2009-10-20  6:45         ` Junio C Hamano
2009-10-20  9:25           ` Alex Riesen
2009-10-20 10:22             ` Johannes Schindelin
2009-10-20 11:56               ` Matthieu Moy
2009-10-20 18:46                 ` [RFC/PATCH] fsck: default to "git fsck --full" Junio C Hamano
2009-10-20 19:00                   ` Nicolas Pitre
2009-10-20 19:11                     ` Junio C Hamano
2009-10-20 18:39           ` git fsck not identifying corrupted packs Nicolas Pitre
2009-10-20 20:49             ` Alex Riesen
2009-10-19 10:56   ` Sergio Callegari
2009-10-19 19:07     ` Wesley J. Landaker
2009-10-20  6:24       ` Matthieu Moy
2009-10-19 18:36   ` Gabor Gombas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v7hur1a0h.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=j.sixt@viscovery.net \
    --cc=sergio.callegari@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).