git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Lundkvist Per <per.lundkvist@saabgroup.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [EXTERNAL] Re: Soundness of signature verification excluding unsigned empty merges
Date: Wed, 22 Mar 2023 18:49:00 -0700	[thread overview]
Message-ID: <CABPp-BFeB9av00=HUt3-Q=e_croHHOKfFu0Oc=axkTQEddG9nQ@mail.gmail.com> (raw)
In-Reply-To: <573a64fb04584a4aa2329c471037d717@saabgroup.com>

On Wed, Mar 22, 2023 at 5:14 AM Lundkvist Per
<per.lundkvist@saabgroup.com> wrote:
>
> Elijah Newren <newren@gmail.com> wrote:
>
> ...
> > What does "validate" mean in this context?  Junio asked a bit about this.
> >
> > An alternative he didn't bring up is perhaps you are trying to protect
> > against supply chain attacks, with a scenario such as someone gains
> > remote access to some computers on your system, but not to any gpg
> > keys (because developers are using gpgs on yubikey or something like
> > that).  In such a case, you might ask the question of whether you can
> > determine if such an attacker has inserted additional commits to your
> > history.  You can use gpg-signed commits by known gpg-keys to rule out
> > most commits as being from a potential attacker, but odds are your git
> > hosting service does merges but doesn't sign them.  You want a way to
> > differentiate between merges it makes and ones an attacker might sneak
> > in.  And you are trying to equate "unsigned empty merge commit" with
> > "was a clean merge anyway, and we're not worried about any permutation
> > of clean merges of signed code".
> >
> > Or maybe you mean something completely different by "validate".
>
> No, you describe it perfectly :)

:-)

> > Junio brushed over whether you could assume "unsigned empty merge
> > commit" is equivalent to "was a clean merge", so I'll focus on that.
> > You often can make such an assumption, but it's not valid in general.
> >
[...]
> >
> > First off, I would use --remerge-diff over --cc.  --remerge-diff was
> > designed to show whether the user made changes relative to what a
> > current automatic remerge of the parents would give (so it shows
> > whether and how they resolved conflicts and if and what other changes
> > they added in).  --cc comes close when you only want to know if the
> > merge was clean, but still isn't quite the same.  One example I can
> > think of is that if there is a modify/delete conflict that the user
> > resolves in favor of keeping the file, then the merge obviously wasn't
> > clean.  But $(git show --pretty=format: --cc $i) will come back empty
> > (leading you to think the merge was clean) while $(git show
> > --pretty=format: --remerge-diff $i) will show there was a
> > modify/delete conflict.
>
> OK, I was not aware of --remerge-diff (we're currently on git 2.35), but from
> your description it seems to be much preferable. --cc makes it even easier to
> sneak in old commits cleanly than expected.
>
> > Second, even with remerge-diff, it is only checking whether a merge with
> > today's merge algorithm and config settings would be clean.  Changing
> > diff.algorithm could in rare cases affect whether commits can be merged
> > cleanly -- and the default in the past was "myers" whereas nowadays for merge
> > specifically it is "histogram".  Also, there have been changes to both of
> > these algorithms in the past (one notable example being the introduction of
> > diff.indentHeuristic and later turning it on by default), and there may be
> > more such changes in the future.  Similarly, merge.directoryRenames was
> > essentially "false" before it was introduced, then was "true" for a while,
> > then defaulted to "conflict".  Someone could have made a merge with an old
> > version of git that used either "false" or "true" for merge.directoryRenames
> > and have it be clean, but when you go to remerge the same commits today you
> > get a conflict due to that variable defaulting to "conflict".  Further, the
> > person running this script may have various `diff` or `merge` config settings
> > globally that differ from those used by whoever or whatever did the past
> > merges.  And, you have to account for the fact that the merge might have been
> > made with something other than git.  GitHub and GitLab used to use libgit2,
> > which had an entirely different merge algorithm.  Gerrit uses jgit, which has
> > its own merge algorithm.  Realistically, there isn't that much difference
> > between all these algorithms.  For nearly all merges, any of these merge
> > algorithms with any of these options will give the exact same answer...but
> > "nearly all" != "all".  So, there is no guarantee.  And we can't and won't
> > even guarantee it going forward even if you stick with Git and somehow ensure
> > using the same config settings either, because we may make further changes to
> > diff and merge algorithms in the future, which may affect the "clean
> > remergeability" (or whatever you call it) of merges you make today.
>
> Good points.
>
> I think we could live with false conflicts. We can acknowledge false negatives
> with a signed tag for only this purpose, right when they occur. We could also
> try to control the local git version and its settings. False positives however
> are worse of course.

So long as you use --remerge-diff rather than --cc, you should be able
to rely on "empty output means the merge would be clean -- with the
current merge algorithm and config options -- and match the results
recorded in the merge commit".  I designed it to behave exactly that
way, so if anyone ever discovers a case where --remerge-diff gives
empty output despite that not being true, then that would be a bug we
would need to investigate and fix.  And even if someone did make a
merge commit using some other algorithm where it conflicted, and they
tweaked it to fix the conflicts, show --remerge-diff would only come
back empty if the tweaks made by that individual with the other merge
algorithm happen to match what you'd get with a clean merge with Git's
current merge algorithm and config options.  So, you'd be pretty safe
from false positives.  You might have to wade through a some false
negatives when merge algorithms or config options change, but as long
as you're prepared for that, it shouldn't be a big deal.

So, if you did that, the major vectors left for an attacker to fool
the "was this code from one of our employees, or from a merge from our
CI system" that I can think of would be:
  * trawl through your code review system to look for historical
signed commits that passed CI with review comments like "We shouldn't
do this; it'd open a security hole.", because they could then merge
those bad changes.
  * find systems that sign commits automatically (in a big enough
organization there always seem to be exceptions to human-signed stuff.
Maybe a system that autofixes obvious code issues and pushes up an
amended-and-signed commit, or something similar, and perhaps the
autofixing doesn't check the input was signed, or the nature of the
autofixing can be controlled somehow.)
  * figure out how to access the system that records which gpg keys
are associated with employees, and add extra attacker-controlled
gpg-keys to existing employees.

> > All that said, the odds that
> > you discover a particular merge in the wild where any of this matters
> > to your usecase is probably small.  So you might get away with
> > it...but if you try it, you should probably add some good comments to
> > the code for whoever comes along and investigates a "bug" in the
> > future, to let them know that false negatives and false positives are
> > possible with your checks, but that you just assumed they would be
> > rare enough that you decided to ignore them.
> >
> > Finally, just a couple preference questions: I'd have used [ -z
> > "$diff" ] rather than [ ! "$diff" ], and a simple "exit 1" rather than
> > "rc=1" to exit early, but maybe you really want all the output.  But
> > if you do want the output, would it make more sense to have the  -z
> > "$diff" before the git verify-commit "$i"?  Also, I'd be tempted to
> > use "git log --format='%h %G? %s' $RANGE" rather than call git
> > verify-commit N times, and then post-process all the "N" cases, but
> > what you have works too.
>
> Yeah I wanted to show an as simple example as possible. The real version use
> explicit fifos, parallel xargs and shows an error for unsigned commits while
> minimising the amount of calls to git verify commit, for performance reasons. It
> just wouldn't have been a suitable example ;) I appreciate the feedback though,
> thank you!
>
> So for a smart enough attacker and a large enough commit history, depending on
> the nature of commits and certain time windows, they may be able to introduce
> previously signed commits that both go undetected and cause damage. This
> validation strategy is also brittle, since it depends on the internals of the
> git implementation and the chosen diff algorithm.
>
> The proper way, and the only way to fully validate the repository would still be
> to require all merge requests to be signed.
>
> We'll see, maybe this can serve as an interim, imperfect solution until we have
> changed our developer workflow. It looks like introducing signed commits and a
> validation strategy (although somewhat brittle) to our current developer
> workflow, would still be an improvement since it is both hopefully a bit tricky
> to reintroduce old signed commits and we currently have no way at all to really
> authenticate the commits.
>
> Thank you (and Junio) very much for the thorough responses, highly appreciated.

Glad it helped.

  reply	other threads:[~2023-03-23  1:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21 10:32 Soundness of signature verification excluding unsigned empty merges Lundkvist Per
2023-03-21 16:43 ` Junio C Hamano
2023-03-22 11:50   ` [EXTERNAL] " Lundkvist Per
2023-03-22  1:41 ` Elijah Newren
2023-03-22 12:14   ` [EXTERNAL] " Lundkvist Per
2023-03-23  1:49     ` Elijah Newren [this message]
2023-03-23  9:55       ` Lundkvist Per

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BFeB9av00=HUt3-Q=e_croHHOKfFu0Oc=axkTQEddG9nQ@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=per.lundkvist@saabgroup.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).