git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Making GitGitGadget conversion lossless
@ 2020-02-26 20:09 Konstantin Ryabitsev
  2020-02-26 21:01 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-26 20:09 UTC (permalink / raw)
  To: git; +Cc: vegard.nossum

Hi, all:

GitGitGadget is great, and I'm looking forward to adapting it to Linux 
Kernel's needs. There is one area where I think the situation can be 
further improved, and that's if the process of converting a pull request 
into a patch series were completely 100% reversible. As of right now, 
the following data is permanently lost from commits as they are 
converted into patches:

- parent/tree hashes
- author/committer information
- cryptographic attestation (gpgsig)

There is an existing body of work done by Vegard Nossum [1] that makes 
it possible to fully reconstruct a git commit from an email message, and 
I hope that it can make its way into official upstream. If that were to 
happen, it would mean that converting from a pull request into a patch 
series would become a lossless operation and tools like GitGitGadget 
would be able to preserve full cryptographic attestation of commits.

Vegard, if there is interest in getting this work into upstream, are you 
in a position to continue your work on it?

Best regards,
-K

[1]: https://lore.kernel.org/git/20191022114518.32055-1-vegard.nossum@oracle.com/#t


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Making GitGitGadget conversion lossless
  2020-02-26 20:09 Making GitGitGadget conversion lossless Konstantin Ryabitsev
@ 2020-02-26 21:01 ` Junio C Hamano
  2020-02-26 21:32   ` Vegard Nossum
  2020-02-26 21:35   ` Konstantin Ryabitsev
  0 siblings, 2 replies; 5+ messages in thread
From: Junio C Hamano @ 2020-02-26 21:01 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git, vegard.nossum

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> - parent/tree hashes

Isn't this already available by recording the base-commit
information?

> - author/committer information
> - cryptographic attestation (gpgsig)

I think you are aiming to come up with bit-for-bit identical commit
the sender had, and I would imagine that the easiest and least
disruptive way to do so is to add a compressed and ascii-armored
copy of "git cat-file commit" output of the original commit after
the "---" line before the diff/diffstat of the e-mailed patch.  The
receiving end can then act on it when given some option by

 - first recover the contents of the commit object (call it #1);
 - learn the parent commit(s) and check out the tree;
 - apply the patch in the remainder of the patch e-mail to the tree;
 - make sure that the result of patch application gives the tree object
   recorded in #1;
 - run "hash-object -t commit -w" over #1 that gives you a commit
   object that is bit-for-bit identical.

As I said already, I do not think that the desire to get the
bit-for-bit identical commit is compatible with the idea to discuss
e-mailed patches---the pieces of patch e-mail will become "you may
look at them, you may apply them, but it is no use to comment on
them to get them improved".  So, I dunno.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Making GitGitGadget conversion lossless
  2020-02-26 21:01 ` Junio C Hamano
@ 2020-02-26 21:32   ` Vegard Nossum
  2020-02-26 21:35   ` Konstantin Ryabitsev
  1 sibling, 0 replies; 5+ messages in thread
From: Vegard Nossum @ 2020-02-26 21:32 UTC (permalink / raw)
  To: Junio C Hamano, Konstantin Ryabitsev; +Cc: git

On 2/26/20 10:01 PM, Junio C Hamano wrote:
> As I said already, I do not think that the desire to get the
> bit-for-bit identical commit is compatible with the idea to discuss
> e-mailed patches---the pieces of patch e-mail will become "you may
> look at them, you may apply them, but it is no use to comment on
> them to get them improved".  So, I dunno.

For me, at least, the goal was to be able to store previous patch
submissions in git (even if it is not merged into the main tree) so
that you can use git and all its tools (diff, log, blame, grep, notes,
etc.) to browse previous versions and browse discussions _and_ use the
SHA1 as a stable identifier for a specific submission.

The point of having the stable identifier is so that the submitter can
take comments into account and resubmit their patchset while still
keeping a (stable, universal, unambiguous) reference to their previous
submission.

I don't see the incompatibility at all. The whole point was that the
current email workflow used by Linux and git (that includes discussion,
feedback, and revision) _does not need to change_.


Vegard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Making GitGitGadget conversion lossless
  2020-02-26 21:01 ` Junio C Hamano
  2020-02-26 21:32   ` Vegard Nossum
@ 2020-02-26 21:35   ` Konstantin Ryabitsev
  2020-02-26 22:27     ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-26 21:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, vegard.nossum

On Wed, Feb 26, 2020 at 01:01:15PM -0800, Junio C Hamano wrote:
> Isn't this already available by recording the base-commit
> information?
> 
> > - author/committer information
> > - cryptographic attestation (gpgsig)
> 
> I think you are aiming to come up with bit-for-bit identical commit
> the sender had, and I would imagine that the easiest and least
> disruptive way to do so is to add a compressed and ascii-armored
> copy of "git cat-file commit" output of the original commit after
> the "---" line before the diff/diffstat of the e-mailed patch.  The
> receiving end can then act on it when given some option by
> 
>  - first recover the contents of the commit object (call it #1);
>  - learn the parent commit(s) and check out the tree;
>  - apply the patch in the remainder of the patch e-mail to the tree;
>  - make sure that the result of patch application gives the tree object
>    recorded in #1;
>  - run "hash-object -t commit -w" over #1 that gives you a commit
>    object that is bit-for-bit identical.

Right, I just don't want to be doing this in a separate tool. :)

> As I said already, I do not think that the desire to get the
> bit-for-bit identical commit is compatible with the idea to discuss
> e-mailed patches---the pieces of patch e-mail will become "you may
> look at them, you may apply them, but it is no use to comment on
> them to get them improved".

I disagree -- specifically from the attestation point of view. One of 
the drawbacks of platforms like lore.kernel.org is that it creates an 
opportunity for a malicious actor to compromise it and modify patches 
that they know will be downloaded and applied by Linux maintainers -- so 
my goal is to ensure that we do not have to trust lore.kernel.org in 
order to trust patches downloaded from it. This means some mechanism for 
end-to-end patch attestation.

There are two avenues that I am pursuing for this purpose:

1. being able to submit attestation information out-of-band, see 
   discussion here: 
   https://lore.kernel.org/workflows/20200226172502.q3fl67ealxsonfgp@chatter.i7.local/T/#u
2. being able to preserve commit signatures as they are converted into 
   patches and back

I know that it is very uncommon for patches to be applied without any 
changes, because the maintainer would almost always add their 
Signed-off-by trailer before applying it to their tree. However, 
preserving full commit metadata allows checking cryptographic 
attestation *before* adding trailers or making any other edits, for 
example by making a shallow clone of the worktree, applying the series 
"verbatim", as you describe above, and then verifying the signature at 
the tip. If "git verify-commit HEAD" is successful, then the maintainer 
can be assured that patch contents have not been modified between when 
they left the developer's system and arrived at the maintainer's 
workstation.

This means nobody needs to trust me or other members of the sysadmin 
team responsible for lore.kernel.org in order to trust patches they 
retrieve from it.

Best,
-K

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Making GitGitGadget conversion lossless
  2020-02-26 21:35   ` Konstantin Ryabitsev
@ 2020-02-26 22:27     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2020-02-26 22:27 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git, vegard.nossum

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> On Wed, Feb 26, 2020 at 01:01:15PM -0800, Junio C Hamano wrote:
>> Isn't this already available by recording the base-commit
>> information?
>> 
>> > - author/committer information
>> > - cryptographic attestation (gpgsig)
>> 
>> I think you are aiming to come up with bit-for-bit identical commit
>> the sender had, and I would imagine that the easiest and least
>> disruptive way to do so is to add a compressed and ascii-armored
>> copy of "git cat-file commit" output of the original commit after
>> the "---" line before the diff/diffstat of the e-mailed patch.  The
>> receiving end can then act on it when given some option by
>> 
>>  - first recover the contents of the commit object (call it #1);
>>  - learn the parent commit(s) and check out the tree;
>>  - apply the patch in the remainder of the patch e-mail to the tree;
>>  - make sure that the result of patch application gives the tree object
>>    recorded in #1;
>>  - run "hash-object -t commit -w" over #1 that gives you a commit
>>    object that is bit-for-bit identical.
>
> Right, I just don't want to be doing this in a separate tool. :)

Yes, and I just outlined how it can be expressed in the
"format-patch" output format, and implemented on the "am" side, as
part of "git".

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-26 22:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26 20:09 Making GitGitGadget conversion lossless Konstantin Ryabitsev
2020-02-26 21:01 ` Junio C Hamano
2020-02-26 21:32   ` Vegard Nossum
2020-02-26 21:35   ` Konstantin Ryabitsev
2020-02-26 22:27     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).