git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: anatoly techtonik <techtonik@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Johannes Sixt" <j6t@kdbg.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: Round-tripping fast-export/import changes commit hashes
Date: Mon, 9 Aug 2021 11:15:44 -0700	[thread overview]
Message-ID: <CABPp-BH5RhHR-KhhumuhZGy2F4ypUBoqgAatY5MKkQsB46KM4g@mail.gmail.com> (raw)
In-Reply-To: <CAPkN8x+agKRRD0Zd-pxs_EuYO_Xm8EyE0nJLCWQB4KNuNkvK8Q@mail.gmail.com>

On Mon, Aug 9, 2021 at 8:45 AM anatoly techtonik <techtonik@gmail.com> wrote:
>
> On Thu, Mar 4, 2021 at 3:56 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Johannes Sixt <j6t@kdbg.org> writes:
> >
> > > Am 02.03.21 um 22:52 schrieb anatoly techtonik:
> > >> For my use case, where I just need to attach another branch in
> > >> time without altering original commits in any way, `reposurgeon`
> > >> can not be used.
> > >
> > > What do you mean by "attach another branch in time"? Because if you
> > > really do not want to alter original commits in any way, perhaps you
> > > only want `git fetch /the/other/repository master:the-other-one-s-master`?
> >
> > Yeah, I had the same impression.  If a bit-for-bit identical copy of
> > the original history is needed, then fetching from the original
> > repository (either directly or via a bundle) would be a much simpler
> > and performant way.
>
> The goal is to have an editable stream, which, if left without edits, would
> be bit-by-bit identical, so that external tools like `reposurgeon` could
> operate on that stream and be audited.

There were some patches proposed some months back[1] to make
fast-import allow importing signed commits...except that they
unconditionally kept the signatures and didn't do any validation,
which would have resulted in invalid signatures if any edits happened.
I suggested adding signature verification (which would allow options
like erroring out if they didn't match, or dropping signatures when
they didn't match but keeping them otherwise).  That'd help usecases
like yours.  The author wasn't interested in implementing that
suggestion (and it's a low priority for me that I may never get around
to).  The series also wasn't pushed through and eventually was
dropped.

However, that wouldn't fully solve your stated goal.  As already
mentioned earlier in this thread, I don't think your stated goal is
realistic; the only complete bit-for-bit identical representation of
the repository is the original binary format.

Your stated goal here, however, isn't required for solving the usecase
you present.

[1] https://lore.kernel.org/git/20210430232537.1131641-1-lukeshu@lukeshu.com/

> Right now, because the repository
> https://github.com/simons-public/protonfixes contains a signed commit
> right from the start, the simple fast-export and fast-import with git itself
> fails the check.

Yes, and I mentioned several other reasons why a round-trip from
fast-export through fast-import cannot be relied upon to preserve
object hashes.

> I understand that patching `git` to add `--complete` to fast-import is
> realistically beyond my coding abilities, and my only option is to parse

It's more patching than that which would be required:
(1) It'd be both fast-export and fast-import that would need patching,
not just fast-import.
(2) --complete is a bit of a misnomer too, because it's not just
get-all-the-data, it's keep-the-data-in-the-original-format.  If
objects had modes of 040000 instead of 40000, despite meaning the same
thing, you'd have to prevent canonicalization and store them as the
original recorded value or you'd get a different hash.  Ditto for
commit messages with extra data after a NUL byte, and a variety of
other possible issues.
(3) fast-export works by looking for the relevant bits it knows how to
export.  You'd have to redesign it to fully parse every bit of data in
each object it looks at, throw errors if it didn't recognize any, and
make sure it exports all the bits.  That might be difficult since it's
hard to know how to future proof it.  How do you guarantee you've
printed every field in a commit struct, when that struct might gain
new fields in the future?  (This is especially challenging since
fast-export/fast-import might not be considered core tools, or at
least don't get as much attention as the "truly core" parts of git;
see https://lore.kernel.org/git/xmqq36mxdnpz.fsf@gitster-ct.c.googlers.com/)

> the binary stream produced by `git cat-file --batch`, which I also won't
> be able to do without specification.

The specification is already available in the manual.  Just run `git
cat-file --help` to see it.  Let me quote part of it for you:

       For example, --batch without a custom format would produce:

           <sha1> SP <type> SP <size> LF
           <contents> LF

> P.S. I am resurrecting the old thread, because my problem with editing
> the history of the repository with an external tool still can not be solved.

Sure it can, just use fast-export's --reference-excluded-parents
option and don't export commits you know you won't need to change.

Or, if for some reason you are really set on exporting everything and
then editing, then go ahead and create the full fast-export output,
including with all your edits, and then post-process it manually
before feeding to fast-import.  In particular, in the post-processing
step find the commits that were problematic that you know won't be
modified, such as your signed commit.  Then go edit that fast-export
dump and (a) remove the dump of the no-longer-signed signed commit
(because you don't want it), and (b) replace any references to the
no-longer-signed-commit (e.g. "from :12") to instead use the hash of
the actual original signed commit (e.g. "from
d3d24b63446c7d06586eaa51764ff0c619113f09").  If you do that, then git
fast-import will just build the new commits on the existing signed
commit instead of on some new commit that is missing the signature.
Technically, you can even skip step (a), as all it will do is produce
an extra commit in your repository that isn't used and thus will be
garbage collected later.

  reply	other threads:[~2021-08-09 18:16 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-27 12:31 Round-tripping fast-export/import changes commit hashes anatoly techtonik
2021-02-27 17:48 ` Elijah Newren
2021-02-28 10:00   ` anatoly techtonik
2021-02-28 10:34     ` Ævar Arnfjörð Bjarmason
2021-03-01  7:44       ` anatoly techtonik
2021-03-01 17:34         ` Junio C Hamano
2021-03-02 21:52           ` anatoly techtonik
2021-03-03  7:13             ` Johannes Sixt
2021-03-04  0:55               ` Junio C Hamano
2021-08-09 15:45                 ` anatoly techtonik
2021-08-09 18:15                   ` Elijah Newren [this message]
2021-08-10 15:51                     ` anatoly techtonik
2021-08-10 17:57                       ` Elijah Newren
2022-12-11 18:30                         ` anatoly techtonik
2023-01-13  7:21                           ` Elijah Newren
2021-03-01 18:06         ` Elijah Newren
2021-03-01 20:04           ` Ævar Arnfjörð Bjarmason
2021-03-01 20:17             ` Elijah Newren
2021-03-02 22:12           ` anatoly techtonik
2021-03-01 20:02         ` Ævar Arnfjörð Bjarmason
2021-03-02 22:23           ` anatoly techtonik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BH5RhHR-KhhumuhZGy2F4ypUBoqgAatY5MKkQsB46KM4g@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=techtonik@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).