On 2021-05-08 at 06:18:23, Dave Huseby wrote:
> Signed-off-by: Dave Huseby <dwh@linuxprogrammer.org>

You definitely need a commit message here explaining your approach and
justifying it.  I'm not saying that your approach is bad, but people
generally read the commit message to help understand and review the
commit, and we will want to know why you've made the decisions you've
made here and rejected other solutions, and the commit message will
hopefully explain that to us.

>  Signed Commits
>  ~~~~~~~~~~~~~~
> -We add a new field "gpgsig-sha256" to the commit object format to allow
> -signing commits without relying on SHA-1. It is similar to the
> -existing "gpgsig" field. Its signed payload is the SHA-256 content of the
> -commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
> +We use the new signature format whenever signing commits without relying
> +on SHA-1. The new format adds a 'signtype' field, zero or more 'signoption'
> +fields, and one 'sign' field to commit objects. This allows for the 'gpgsig'
> +field to coexist if needed.
> 
>  This means commits can be signed
> 
>  1. using SHA-1 only, as in existing signed commit objects
> -2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
> -   fields.
> -3. using only SHA-256, by only using the gpgsig-sha256 field.
> +2. using both SHA-1 and SHA-256, by using both gpgsig and the V2 format.
> +3. using only SHA-256, by only using the V2 format.

SHA-256 repositories already exist and already use this format.  You
cannot change it now, since doing so will needlessly break commits.
Just because they are experimental doesn't mean we should break
compatibility needlessly.

>  Old versions of "git verify-commit" can verify the gpgsig signature in
>  cases (1) and (2) without modifications and view case (3) as an
> @@ -427,17 +426,16 @@ ordinary unsigned commit.
>  Signed Tags
>  ~~~~~~~~~~~
> -We add a new field "gpgsig-sha256" to the tag object format to allow
> -signing tags without relying on SHA-1. Its signed payload is the
> -SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
> -SIGNATURE-----" delimited in-body signature removed.
> +We use the new signature format whenever signing tags without relying
> +on SHA-1. The new format adds a 'signtype' field, zero or more 'signoption'
> +fields, and one 'sign' field to tag objects. This allows for the in-body
> +SHA-1 signature to coexist if needed.
> 
>  This means tags can be signed
> 
>  1. using SHA-1 only, as in existing signed tag objects
> -2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
> -   signature.
> -3. using only SHA-256, by only using the gpgsig-sha256 field.
> +2. using both SHA-1 and SHA-256, by using both gpgsig and the V2 format.
> +3. using only SHA-256, by only using the V2 format.

The same thing here.  This is currently out of date; we use signed tags
for the same algorithm (that is, for the SHA-1 signature of a SHA-1
object and for the SHA-256 signature of a SHA-256 object) and fields for
the other.

> +Version 2 Format
> +----------------
> +
> +The goals of the version 2 format are as follows:
> +
> +- Backwards Compatible with version 1 without any intervention.
> +- Eliminate signature-format-specific string matching for detecting/parsing
> +  version 2 signatures.
> +- Use fields for storing signatures in all objects and transactions.
> +- Have a field that identifies the signature format.
> +- Have fields that store options required by the verification tool.
> +- Store signature data verbatim as created with the signing tool with escaped
> +  newlines and carriage returns.

We probably need to define what happens if both a v1 and v2 signature
occur.  Probably the commit is invalid and should be rejected.

> +Fields
> +~~~~~~
> +
> +The version 2 format uses three new fields listed below:
> +
> +- signtype <signature format name>
> +- signoption <token> [= <value>]
> +- sign <multiline signature data>
> +
> +A complete signature must include exactly one `signtype` field, zero or more
> +`signoption` fields and one `sign` field.

This is going to be a problem.  We want to dual-sign objects over both
their SHA-1 and SHA-256 values.  We already have dual-signed objects in
the repository.

> +The `signtype` field has a string value that identifies the type of signature
> +that is stored in the object. This string is case-insensitive and is used to
> +identify which configuration settings are to used when verifying the signature.
> +The corresponding configuration settings are in the config file as:
> +'sign.<signType>.options.*'. The 'sign' field stores the text-encoded signature
> +from the signing tool with any newlines and carriage returns escaped as '%0a'
> +and '%0d' respectively. Whitespace is significant in digital signatures and
> +the protocol for sending/receiving data to/from signing and verification tools
> +requires that newlines and carriage returns are escaped. By storing the escaped
> +version in the 'sign' field, we avoid extra processing and error-prone
> +escaping.

Is there a reason we can't use the indented block format currently used
by gpgsig and friends?  Are there really formats that need to support
carriage returns in their signature?  If so, can we just specify those
algorithms will encode it with base64?

Also, it will be required to escape other values as well.  For example,
if you encode things with %, then you must escape %.  Git will also not
allow NUL bytes in commit objects, so if you want to encode arbitrary
binary data (which I would just do by requiring people to base64, but
you may not want to), you'll need to escape them as well.

> +The introduction of the 'signoption' field is necessary for Git to remain
> +agnostic to the tools used for signing and verification of cryptographic
> +signatures. Some signign tools do not produce cryptographic signatures that
> +include all of the data needed to verify the signature. A good example of this
> +is the minisign tool (https://jedisct1.github.io/minisign/) which requires the
> +public key to be supplied to the verification operation. In that case, the
> +signing operation would return to Git a 'signoption' field along with the
> +'signtype' and 'sign' fields to be stored in the Git object. When Git verifies
> +the signature, it will parse the 'signoption' fields and send them to the
> +verification tool as OPTION commands. In this case it will send the public key
> +along with the signature for verification. This design allows for arbitrary
> +options to be stored in the Git object by the signing tool that will get passed
> +to the verification tool later without Git knowing or understanding any details
> +of a particular signing tool.

While I'd like to see us support minisign, I have concerns about
embedding the public key in the signature.  First, that bloats commit
objects, and large commit objects can cause performance and scalability
problems.  Many things like partial clone implicitly assume that commits
and tags are small.

Second, the goal is presumably to verify that the signature identifies
some relevant party, not just some arbitrary user.  As a consequence, I
think it's safe to assume that we have a way to acquire the public key
of the trusted user whose signature we want to verify.

> +The order in which OPTION commands are sent to the signing and verification
> +tools is significant. OPTION commands that come later override OPTION commands
> +that came earlier and had the same token name. Git always sends OPTION commands
> +from the command line after the options from the config. In verification
> +operations, Git sends the options from the signed object first, before the
> +config and command line options. This ensures local control over option values.
> +An example would be if the sign.openpgp.options.minTrustLevel config option is
> +set to "marginal" and the command line `--sign-option minTrustLevel = full` is
> +issued. Git would first send an OPTION command setting
> +`minTrustLevel = marginal` from the config and then override that by later
> +sending an OPTION command setting `minTrustLevel = full` from the command line.

It's a security problem to allow people to control verification
parameters or command-line options.  The former allows users to validate
signatures using weak algorithms or reduced trust levels, and the latter
may allow arbitrary code execution.

Unless this field is going to be used for some signature parameter other
than public keys or verification parameters, I'd rather we omit it.  If
we keep it, the allowed parameters must be strictly defined and
documented per format and not allowed to contain arbitrary data.

Overall, I'm generally positive on this approach, although I think it
needs some further refinements as I've mentioned above.  My preference
is that we specifically document specific signature schemes and, if we
keep the options, which options are allowed.  For example, "minisign"
and "signify" are compatible, and by specifying and documenting
well-known formats, we avoid the problem where people end up writing
invalid commits by typoing the name in their config.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US