git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Checks added for CVE-2018-11235 too restrictive?
@ 2018-07-03  7:06 Mike Hommey
  2018-07-03 14:15 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Hommey @ 2018-07-03  7:06 UTC (permalink / raw)
  To: peff, git

Hi,

(Background) I'm the author of a git remote-helper that can talk
directly to mercurial servers, known as git-cinnabar. One of the design
decisions was to use git objects to store all the metadata necessary to
reconstruct mercurial changesets, manifests and files, and one special
thing that's done is that git trees are (ab)used to store mercurial
sha1s in commit references. This design decision was made so that the
metadata could possibly be exchanged, allowing to jumpstart clone of
large repositories faster than by converting from scratch from
mercurial.

I had a first shot at that a few months ago, but the format of the
metadata branch made it impossible to push to github, hitting its push
size limit. With some pre-processing work, I was able to push the data
to github, with the intent to come back with a new metadata branch
format that would make the push directly possible.

Fast forward to this week, where I was trying to upload a new metadata
branch, and failed in a rather interesting way: multiple lines looking
like:

remote: error: object 81eae74b046d284c47e788143bbbcc681cb53418: gitmodulesMissing: unable to read .gitmodules blob

which, apart from the fact that they have some formatting issue, appear
to be new from the fixes for CVE-2018-11235.

I can see what those fixes are trying to prevent, but they seem to be
overly restrictive, at least in the context of transfer.fsckObjects.

The core problem is that the mercurial repository has some .gitmodules
files in some subdirectories. As a consequence, the git objects storing
the info for those mercurial files contain trees with .gitmodules files
with a commit ref, and that's what the remote is complaining about.

(Surpringly, it doesn't hit the "non-blob found at .gitmodules" case
first, which is weird).

A small testcase to reproduce looks like this:

$ git init bar; cd bar
$ git fast-import <<EOF
commit refs/heads/bar
committer Bar <bar@bar> 0 +0000
data 0
                                                                 
M 160000 81eae74b046d284c47e788143bbbcc681cb53418 bar/.gitmodules
                                                                 
EOF
$ git init ../qux
$ git -C ../qux config receive.fsckObjects true
$ git push -q ../qux bar
remote: error: object 81eae74b046d284c47e788143bbbcc681cb53418: gitmodulesMissing: unable to read .gitmodules blob
remote: fatal: fsck error in pack objects
error: remote unpack failed: unpack-objects abnormal exit
To ../qux
 ! [remote rejected] bar -> bar (unpacker error)
error: failed to push some refs to '../qux'

Would it be reasonable to make the transfer.fsckObject checks ignore
non-blob .gitmodules?

Cheers,

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checks added for CVE-2018-11235 too restrictive?
  2018-07-03  7:06 Checks added for CVE-2018-11235 too restrictive? Mike Hommey
@ 2018-07-03 14:15 ` Jeff King
  2018-07-03 17:45   ` Junio C Hamano
  2018-07-03 22:30   ` Mike Hommey
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff King @ 2018-07-03 14:15 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git, Ramsay Jones

On Tue, Jul 03, 2018 at 04:06:50PM +0900, Mike Hommey wrote:

> I had a first shot at that a few months ago, but the format of the
> metadata branch made it impossible to push to github, hitting its push
> size limit. With some pre-processing work, I was able to push the data
> to github, with the intent to come back with a new metadata branch
> format that would make the push directly possible.

This is sort of tangential to your email here, but I'm curious which
limit you hit. Too-big blob, tree, or commit?

> Fast forward to this week, where I was trying to upload a new metadata
> branch, and failed in a rather interesting way: multiple lines looking
> like:
> 
> remote: error: object 81eae74b046d284c47e788143bbbcc681cb53418: gitmodulesMissing: unable to read .gitmodules blob
> 
> which, apart from the fact that they have some formatting issue, appear
> to be new from the fixes for CVE-2018-11235.

Also tangential to your main point (I promise I'll get to it ;) ), but
what formatting issue do you mean? Are you referring to
"gitmodulesMissing"? That's a machine-readable tag which means you can
set "fsck.gitmodulesMissing" to "ignore".

> I can see what those fixes are trying to prevent, but they seem to be
> overly restrictive, at least in the context of transfer.fsckObjects.
> 
> The core problem is that the mercurial repository has some .gitmodules
> files in some subdirectories. As a consequence, the git objects storing
> the info for those mercurial files contain trees with .gitmodules files
> with a commit ref, and that's what the remote is complaining about.
> 
> (Surpringly, it doesn't hit the "non-blob found at .gitmodules" case
> first, which is weird).

The reason it doesn't hit the "non-blob" case is that we are trying to
analyze the object itself. So we don't pay any attention to the mode in
the containing tree, but instead literally look for 81eae74b0. If it
were a non-blob we'd complain then, but in fact we don't have it at all
(which is otherwise OK because it's a gitlink).

> A small testcase to reproduce looks like this:
> 
> $ git init bar; cd bar
> $ git fast-import <<EOF
> commit refs/heads/bar
> committer Bar <bar@bar> 0 +0000
> data 0
>                                                                  
> M 160000 81eae74b046d284c47e788143bbbcc681cb53418 bar/.gitmodules
>                                                                  
> EOF
>
> [...]
> 
> Would it be reasonable to make the transfer.fsckObject checks ignore
> non-blob .gitmodules?

I'm open to the idea that the new checks are too restrictive (both this
and the gitmodulesParse error we're discussing in [1]). They definitely
outlaw things that _used_ to be OK. And the security benefit is a little
hand-wavy. They're not strictly needed to block the recent
vulnerability[2].  However, they _could_ protect us from future problems
(e.g., an overflow in the config-parsing code, which is not accessible
to attackers outside of .gitmodules). So there is some value in being
restrictive, but it's mostly hypothetical for now.

So I think we could simply loosen these cases without immediate effect.
That said, I'm not sure that having a gitlink .gitmodules is sane in the
first place. Attempting to "git submodule init" there is going to cause
errors. Well, sort of -- your example actually includes it in a
subdirectory of the repository, so technically we wouldn't use it for
real submodules. That fudging (finding .gitmodules anywhere instead of
just at the root) was a conscious decision to reduce the complexity and
cost of the check.

It sounds like in this case you don't have existing history that does
this with .gitmodules, but are rather looking into designing a new
feature that stuffs data into git trees. I'd be interested to hear a bit
more about that feature to see if there are other alternatives.

[1] https://public-inbox.org/git/2fc2d53f-e193-2a2a-9f8f-b3e1d256d940@ramsayjones.plus.com/

[2] If you do not have the object, or cannot parse it as config, it
    cannot have a malicious config name in it. We do have to be wary of
    cases where we get the tree in one push, and then later get the
    pointed-to object. But I think for anything _except_ a gitlink, we'd
    complain about the missing object for the first push. And for a
    gitlink, we wouldn't check out the blob, so we're OK.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checks added for CVE-2018-11235 too restrictive?
  2018-07-03 14:15 ` Jeff King
@ 2018-07-03 17:45   ` Junio C Hamano
  2018-07-03 18:34     ` Jeff King
  2018-07-03 22:30   ` Mike Hommey
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2018-07-03 17:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Mike Hommey, git, Ramsay Jones

Jeff King <peff@peff.net> writes:

>> A small testcase to reproduce looks like this:
>> 
>> $ git init bar; cd bar
>> $ git fast-import <<EOF
>> commit refs/heads/bar
>> committer Bar <bar@bar> 0 +0000
>> data 0
>>                                                                  
>> M 160000 81eae74b046d284c47e788143bbbcc681cb53418 bar/.gitmodules
>>                                                                  
>> EOF
>>
>> [...]
>> 
>> Would it be reasonable to make the transfer.fsckObject checks ignore
>> non-blob .gitmodules?
>
> I'm open to the idea that the new checks are too restrictive (both this
> and the gitmodulesParse error we're discussing in [1]). They definitely
> outlaw things that _used_ to be OK. And the security benefit is a little
> hand-wavy. They're not strictly needed to block the recent
> vulnerability[2].  However, they _could_ protect us from future problems
> (e.g., an overflow in the config-parsing code, which is not accessible
> to attackers outside of .gitmodules). So there is some value in being
> restrictive, but it's mostly hypothetical for now.

As you said elsewhere, the above example would probably not cause
harm to the production use of .gitmodules (as opposed to what fsck
checks) as it is not at the top of the working tree, but does the
production code do a wrong thing if a directory, a symbolic link or
a gitlink appear at ".gitmodules" at the top level?

If so, we probably need to tighten code in submodules.c and
elsewhere to ignore such ".gitmodules" directory etc [*1*].

And if not, I think it is OK to loosen fsck to exclude ".gitmodules"
that are not regular files, e.g. in fsck.c::fsck_tree(), we have
this for each entry we encounter in a tree:

		...
		has_zero_pad |= *(char *)desc.buffer == '0';

		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
			if (!S_ISLNK(mode))
				oidset_insert(&gitmodules_found, oid);
			else
				retval += report(options, &item->object,
						 FSCK_MSG_GITMODULES_SYMLINK,
						 ".gitmodules is a symbolic link");
		}

Can mode be 160000 or 40000 here?  The code seems to assume that, as
long as the thing is named ".gitmodules", we are looking at a blob,
so symlinks are worth reporting immediately and all others are worth
investigating later by marking them in the gitmodules_found oidset.

I think it is a good idea to be tighter than necessary by default,
so rejecting a gitlink ".gitmodules" unless the user tells us with
some configuration knob is a good thing to do, but it probably makes
sense to have users a way to opt-out of this "There is .gitmodules
but it is not a readable blob" sanity check.



[Footnote]

*1* A quick scan of the code there looking for GITMODULES_FILE did
not give me a very good feeling---it seems that the code trusts what
is in the index and in the working tree to be non-hostile a bit too
much.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checks added for CVE-2018-11235 too restrictive?
  2018-07-03 17:45   ` Junio C Hamano
@ 2018-07-03 18:34     ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2018-07-03 18:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mike Hommey, git, Ramsay Jones

On Tue, Jul 03, 2018 at 10:45:51AM -0700, Junio C Hamano wrote:

> > I'm open to the idea that the new checks are too restrictive (both this
> > and the gitmodulesParse error we're discussing in [1]). They definitely
> > outlaw things that _used_ to be OK. And the security benefit is a little
> > hand-wavy. They're not strictly needed to block the recent
> > vulnerability[2].  However, they _could_ protect us from future problems
> > (e.g., an overflow in the config-parsing code, which is not accessible
> > to attackers outside of .gitmodules). So there is some value in being
> > restrictive, but it's mostly hypothetical for now.
> 
> As you said elsewhere, the above example would probably not cause
> harm to the production use of .gitmodules (as opposed to what fsck
> checks) as it is not at the top of the working tree, but does the
> production code do a wrong thing if a directory, a symbolic link or
> a gitlink appear at ".gitmodules" at the top level?
> 
> If so, we probably need to tighten code in submodules.c and
> elsewhere to ignore such ".gitmodules" directory etc [*1*].

I tried it when writing my earlier message, and I couldn't convince it
to do anything too terrible (these both have one true submodule and a
gitlink .gitmodules):

  $ git status
  warning: unable to access '/home/peff/tmp/.gitmodules': Is a directory
  On branch master
  nothing to commit, working tree clean

  $ git submodule update --init
  warning: unable to access '/home/peff/tmp/.gitmodules': Is a directory
  fatal: No url found for submodule path '.gitmodules' in .gitmodules

Obviously the setup is broken, but I think most code paths would hit
that fopen() failure. Possibly something that reads directly from the
index might not check the object type carefully and could read cruft
from a non-blob (though my understanding from past discussions is that
we don't actually do direct from-index reading yet).

> And if not, I think it is OK to loosen fsck to exclude ".gitmodules"
> that are not regular files, e.g. in fsck.c::fsck_tree(), we have
> this for each entry we encounter in a tree:
> 
> 		...
> 		has_zero_pad |= *(char *)desc.buffer == '0';
> 
> 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
> 			if (!S_ISLNK(mode))
> 				oidset_insert(&gitmodules_found, oid);
> 			else
> 				retval += report(options, &item->object,
> 						 FSCK_MSG_GITMODULES_SYMLINK,
> 						 ".gitmodules is a symbolic link");
> 		}
> 
> Can mode be 160000 or 40000 here?  The code seems to assume that, as
> long as the thing is named ".gitmodules", we are looking at a blob,
> so symlinks are worth reporting immediately and all others are worth
> investigating later by marking them in the gitmodules_found oidset.

Right, that's more or less how the logic works now. Which actually
brings up another philosophical issue. We wouldn't want to loosen the
symlink check here, since it is serving an actual security purpose.  So
if we were to loosen the other bits (say, allowing .gitmodules to be a
non-blob), we still could not say "...and there is no regression with
respect to all old .gitmodules tree entries". Some would work, and some.
would not. Is it better to be consistent ("we assume .gitmodules entries
are well-formed, and do not use that name if you are not") or to try to
be as permissive as possible?

> I think it is a good idea to be tighter than necessary by default,
> so rejecting a gitlink ".gitmodules" unless the user tells us with
> some configuration knob is a good thing to do, but it probably makes
> sense to have users a way to opt-out of this "There is .gitmodules
> but it is not a readable blob" sanity check.

That already exists using the fsck "ignore" config and skiplist. I think
the question is just whether it is an undue burden to have to deal with
those. Setting a config option is not too bad, but quite often you are
crossing an administrative boundary (convincing a hoster to loosen fsck,
or convincing users who fetch from you that they need to set a certain
config variable).

> *1* A quick scan of the code there looking for GITMODULES_FILE did
> not give me a very good feeling---it seems that the code trusts what
> is in the index and in the working tree to be non-hostile a bit too
> much.

I won't be surprised if there is a way to cause mischief there. The
extent of my probing was the two commands I showed above. :)

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checks added for CVE-2018-11235 too restrictive?
  2018-07-03 14:15 ` Jeff King
  2018-07-03 17:45   ` Junio C Hamano
@ 2018-07-03 22:30   ` Mike Hommey
  2018-07-04  8:18     ` Mike Hommey
  1 sibling, 1 reply; 6+ messages in thread
From: Mike Hommey @ 2018-07-03 22:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ramsay Jones

On Tue, Jul 03, 2018 at 10:15:19AM -0400, Jeff King wrote:
> On Tue, Jul 03, 2018 at 04:06:50PM +0900, Mike Hommey wrote:
> 
> > I had a first shot at that a few months ago, but the format of the
> > metadata branch made it impossible to push to github, hitting its push
> > size limit. With some pre-processing work, I was able to push the data
> > to github, with the intent to come back with a new metadata branch
> > format that would make the push directly possible.
> 
> This is sort of tangential to your email here, but I'm curious which
> limit you hit. Too-big blob, tree, or commit?

Too-big pack, iirc.

> > Fast forward to this week, where I was trying to upload a new metadata
> > branch, and failed in a rather interesting way: multiple lines looking
> > like:
> > 
> > remote: error: object 81eae74b046d284c47e788143bbbcc681cb53418: gitmodulesMissing: unable to read .gitmodules blob
> > 
> > which, apart from the fact that they have some formatting issue, appear
> > to be new from the fixes for CVE-2018-11235.
> 
> Also tangential to your main point (I promise I'll get to it ;) ), but
> what formatting issue do you mean? Are you referring to
> "gitmodulesMissing"? That's a machine-readable tag which means you can
> set "fsck.gitmodulesMissing" to "ignore".

Oh, I was reading it as something finishing with "gitmodules", and
another message starting with "Missing". The fact the error is so
long and on one line made me think that, I guess.

> > I can see what those fixes are trying to prevent, but they seem to be
> > overly restrictive, at least in the context of transfer.fsckObjects.
> > 
> > The core problem is that the mercurial repository has some .gitmodules
> > files in some subdirectories. As a consequence, the git objects storing
> > the info for those mercurial files contain trees with .gitmodules files
> > with a commit ref, and that's what the remote is complaining about.
> > 
> > (Surpringly, it doesn't hit the "non-blob found at .gitmodules" case
> > first, which is weird).
> 
> The reason it doesn't hit the "non-blob" case is that we are trying to
> analyze the object itself. So we don't pay any attention to the mode in
> the containing tree, but instead literally look for 81eae74b0. If it
> were a non-blob we'd complain then, but in fact we don't have it at all
> (which is otherwise OK because it's a gitlink).
> 
> > A small testcase to reproduce looks like this:
> > 
> > $ git init bar; cd bar
> > $ git fast-import <<EOF
> > commit refs/heads/bar
> > committer Bar <bar@bar> 0 +0000
> > data 0
> >                                                                  
> > M 160000 81eae74b046d284c47e788143bbbcc681cb53418 bar/.gitmodules
> >                                                                  
> > EOF
> >
> > [...]
> > 
> > Would it be reasonable to make the transfer.fsckObject checks ignore
> > non-blob .gitmodules?
> 
> I'm open to the idea that the new checks are too restrictive (both this
> and the gitmodulesParse error we're discussing in [1]). They definitely
> outlaw things that _used_ to be OK. And the security benefit is a little
> hand-wavy. They're not strictly needed to block the recent
> vulnerability[2].  However, they _could_ protect us from future problems
> (e.g., an overflow in the config-parsing code, which is not accessible
> to attackers outside of .gitmodules). So there is some value in being
> restrictive, but it's mostly hypothetical for now.
> 
> So I think we could simply loosen these cases without immediate effect.
> That said, I'm not sure that having a gitlink .gitmodules is sane in the
> first place. Attempting to "git submodule init" there is going to cause
> errors. Well, sort of -- your example actually includes it in a
> subdirectory of the repository, so technically we wouldn't use it for
> real submodules. That fudging (finding .gitmodules anywhere instead of
> just at the root) was a conscious decision to reduce the complexity and
> cost of the check.
> 
> It sounds like in this case you don't have existing history that does
> this with .gitmodules, but are rather looking into designing a new
> feature that stuffs data into git trees. I'd be interested to hear a bit
> more about that feature to see if there are other alternatives.

Yes, in fact, this history is not even meant to be checked out. It's
internal stuff, stuck in a ref in a non-traditional ref namespace (that
is, it's not under refs/heads, refs/tags, etc., it's under
refs/cinnabar)

I'd rather avoid using entirely new features that wouldn't work with
widely used git versions. I can actually work around the current problem
by adding a prefix to all file names. That doesn't sound great, but is
probably good enough. In fact, it might even avoid future similar problems.

That being said, I'm not even sure this particular use case is worth a
new feature. I'm not storing random stuff as gitlinks, I'm storing
sha1s. Well, maybe a mode that makes the distinction between "git oid"
and "external oid" might make things clearer for git itself, especially
for fsck.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checks added for CVE-2018-11235 too restrictive?
  2018-07-03 22:30   ` Mike Hommey
@ 2018-07-04  8:18     ` Mike Hommey
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Hommey @ 2018-07-04  8:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wed, Jul 04, 2018 at 07:30:30AM +0900, Mike Hommey wrote:
> That being said, I'm not even sure this particular use case is worth a
> new feature. I'm not storing random stuff as gitlinks, I'm storing
> sha1s. Well, maybe a mode that makes the distinction between "git oid"
> and "external oid" might make things clearer for git itself, especially
> for fsck.

Actually, I'm also abusing the lower bits of the gitlink mode, to
differentiate between regular files, executable files, and symlinks.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-07-04  8:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-03  7:06 Checks added for CVE-2018-11235 too restrictive? Mike Hommey
2018-07-03 14:15 ` Jeff King
2018-07-03 17:45   ` Junio C Hamano
2018-07-03 18:34     ` Jeff King
2018-07-03 22:30   ` Mike Hommey
2018-07-04  8:18     ` Mike Hommey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).