git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ramsay Jones <ramsay@ramsayjones.plus.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Jason@zx2c4.com, GIT Mailing-list <git@vger.kernel.org>
Subject: Re: [PATCH] fsck: check skiplist for object in fsck_blob()
Date: Thu, 28 Jun 2018 17:56:18 +0100	[thread overview]
Message-ID: <0a18acbd-0124-1c92-0046-05b8b035dd28@ramsayjones.plus.com> (raw)
In-Reply-To: <20180628114912.GA12901@sigill.intra.peff.net>



On 28/06/18 12:49, Jeff King wrote:
> On Wed, Jun 27, 2018 at 07:39:53PM +0100, Ramsay Jones wrote:
> 
>> Since commit ed8b10f631 ("fsck: check .gitmodules content", 2018-05-02),
>> fsck will issue an error message for '.gitmodules' content that cannot
>> be parsed correctly. This is the case, even when the corresponding blob
>> object has been included on the skiplist. For example, using the cgit
>> repository, we see the following:
>>
>>   $ git fsck
>>   Checking object directories: 100% (256/256), done.
>>   error: bad config line 5 in blob .gitmodules
>>   error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob
>>   Checking objects: 100% (6626/6626), done.
>>   $
>>
>>   $ git config fsck.skiplist '.git/skip'
>>   $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip
>>   $
>>
>>   $ git fsck
>>   Checking object directories: 100% (256/256), done.
>>   error: bad config line 5 in blob .gitmodules
>>   Checking objects: 100% (6626/6626), done.
>>   $
>>
>> Note that the error message issued by the config parser is still
>> present, despite adding the object-id of the blob to the skiplist.
>>
>> One solution would be to provide a means of suppressing the messages
>> issued by the config parser. However, given that (logically) we are
>> asking fsck to ignore this object, a simpler approach is to just not
>> call the config parser if the object is to be skipped. Add a check to
>> the 'fsck_blob()' processing function, to determine if the object is
>> on the skiplist and, if so, exit the function early.
> 
> Yeah, this solution seems sensible. Given that we would never report any
> error for that blob, there is no point in even looking at it. I wonder
> if we ought to do the same for other types, too. Is there any point in
> opening a tree that is in the skiplist?

Note that the 'blob' object has already been 'loaded' at this
point anyway (and the basic 'object' checks have been done).

I did think about this, briefly, but decided that we should
only 'skip' the leaf nodes (blobs). (So, when processing
commits, trees and tags, we should not report an error for
that object-id, but that should not stop us from checking
the tree object associated with a commit, just because of
a problem with the commit message).

[Oh, wait - Hmm, each object could be checked independently
of all others and not used for any object traversal right?
(e.g. using packfile index). I saw fsck_walk() and didn't
look any further ... Ah, broken link check, ... I obviously
need to read the code some more ... :-D ]

>> I noticed recently that the 'cgit.git' repo was complaining when
>> doing an 'git fsck' ...
>>
>> Note that 'cgit' had a '.gitmodules' file and a 'submodule.sh'
>> script back in 2007, which had nothing to do with the current
>> git submodule facilities, ... ;-)
> 
> Yikes. I worried about that sort of regression when adding the
> .gitmodules checks. But this _is_ a pretty unique case: somebody was
> implementing their own version of submodules (pre-git-submodule) and
> decided to use the same name. So I'm not sure if this is a sign that we
> need to think through the regression, or a sign that it really is rare.

I don't have any numbers, but my gut tells me that this would
be very rare indeed. Of course, my gut has been wrong before ... :-D

> One thing we could do is turn the parse failure into a noop. The main
> point of the check is to protect people against the malicious
> .gitmodules bug. If the file can't be parsed, then it also can't be an
> attack vector (assuming the parser used for the fsck check and the
> parser used by the victim behave identically).

Hmm, yeah, but I would have to provide a means of suppressing
the config parser error messages. Something I wanted to avoid. :(

> That wouldn't help with your stray message, of course, but it would
> eliminate the need to deal with the skiplist here (and skiplists aren't
> always easy to do -- for instance, pushing up a non-fork of cgit to
> GitHub would now be rejected because of this old file, and you'd have to
> contact support to resolve it).

Good point.

>> I just remembered that I had intended to review the name of the
>> function that this patch introduces before sending, but totally
>> forgot! :(
>>
>> [Hmm, 'to_be_skipped' -> object_to_be_skipped, object_on_skiplist,
>> etc., ... suggestions welcome!]
> 
> The current name is OK to be, but object_on_skiplist() also seems quite
> obvious.

object_on_skiplist() it is!

Junio, do you want me to address the above 'rejected push'
issue in this patch, or with a follow-up patch? (It should
be a pretty rare problem - famous last words!)

ATB,
Ramsay Jones




  parent reply	other threads:[~2018-06-28 16:56 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-27 18:39 [PATCH] fsck: check skiplist for object in fsck_blob() Ramsay Jones
2018-06-28 11:49 ` Jeff King
2018-06-28 16:39   ` Junio C Hamano
2018-06-28 17:30     ` Jeff King
2018-06-28 16:56   ` Ramsay Jones [this message]
2018-06-28 17:28     ` Junio C Hamano
2018-06-28 17:45     ` Jeff King
2018-06-28 18:53       ` Ramsay Jones
2018-06-28 22:03         ` Jeff King
2018-06-28 22:05           ` [PATCH 1/4] config: turn die_on_error into caller-facing enum Jeff King
2018-06-28 22:05           ` [PATCH 2/4] config: add CONFIG_ERROR_SILENT handler Jeff King
2018-06-28 22:05           ` [PATCH 3/4] config: add options parameter to git_config_from_mem Jeff King
2018-06-28 22:06           ` [PATCH 4/4] fsck: silence stderr when parsing .gitmodules Jeff King
2018-06-28 22:12             ` Jeff King
2018-06-29  1:14               ` Ramsay Jones
2018-06-29  1:10           ` [PATCH] fsck: check skiplist for object in fsck_blob() Ramsay Jones
2018-07-03 14:34             ` Jeff King
2018-07-04  0:12               ` Ramsay Jones
2018-07-07  1:32                 ` Jeff King
2018-07-11 19:31                   ` Ramsay Jones
2018-07-13 19:37                     ` Ramsay Jones
2018-07-13 19:41                       ` Jeff King
2018-07-13 19:46                         ` Jeff King
2018-07-13 20:08                           ` Ramsay Jones
2018-07-13 19:38                     ` Jeff King
2018-07-13 19:39                       ` [PATCH 1/2] fsck: split ".gitmodules too large" error from parse failure Jeff King
2018-07-13 19:39                       ` [PATCH 2/2] fsck: downgrade gitmodulesParse default to "info" Jeff King
2018-07-13 20:21                         ` Stefan Beller
2018-07-16 18:04                         ` Junio C Hamano
2018-07-16 18:30                           ` Jeff King
2018-07-16 21:08                             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a18acbd-0124-1c92-0046-05b8b035dd28@ramsayjones.plus.com \
    --to=ramsay@ramsayjones.plus.com \
    --cc=Jason@zx2c4.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).