git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: anatoly techtonik <techtonik@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Round-tripping fast-export/import changes commit hashes
Date: Mon, 1 Mar 2021 12:17:36 -0800	[thread overview]
Message-ID: <CABPp-BHdtAKz_V2RhBqevMy+Hy_rHtQ7Y2chggpt1rZ9nRn8Zw@mail.gmail.com> (raw)
In-Reply-To: <87ft1ek5dg.fsf@evledraar.gmail.com>

On Mon, Mar 1, 2021 at 12:04 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Mar 01 2021, Elijah Newren wrote:
>
> > On Sun, Feb 28, 2021 at 11:44 PM anatoly techtonik <techtonik@gmail.com> wrote:
> >>
> >> On Sun, Feb 28, 2021 at 1:34 PM Ævar Arnfjörð Bjarmason
> >> <avarab@gmail.com> wrote:
> >> >
> >> > I think Elijah means that in the general case people are using fast
> >> > export/import to export/import between different systems or in
> >> > combination with a utility like git-filter-repo.
> >> >
> >> > In those cases users are also changing the content of the repository, so
> >> > the hashes will change, invalidating signatures.
> >> >
> >> > But there's also cases where e.g. you don't modify the history, or only
> >> > part of it, and could then preserve these headers. I think there's no
> >> > inherent reason not to do so, just that nobody's cared enough to submit
> >> > patches etc.
> >>
> >> Is fast-export/import the only way to filter information in `git`? Maybe there
> >> is a slow json-export/import tool that gives a complete representation of all
> >> events in a repository? Or API that can be used to serialize and import that
> >> stream?
> >>
> >> If no, then I'd like to take a look at where header filtering and serialization
> >> takes place. My C skills are at the "hello world" level, so I am not sure I can
> >> write a patch. But I can write the logic in Python and ask somebody to port
> >> that.
> >
> > If you are intent on keeping signatures because you know they are
> > still valid, then you already know you aren't modifying any
> > blobs/trees/commits leading up to those signatures.  If that is the
> > case, perhaps you should just avoid exporting the signature or
> > anything it depends on, and just export the stuff after that point.
> > You can do this with fast-export's --reference-excluded-parents option
> > and pass it an exclusion range.  For example:
> >
> >    git fast-export --reference-excluded-parents ^master~5 --all
> >
> > and then pipe that through fast-import.
> >
> >
> > In general, I think if fast-export or fast-import are lacking features
> > you want, we should add them there, but I don't see how adding
> > signature reading to fast-import and signature exporting to
> > fast-export makes sense in general.  Even if you assume fast-import
> > can process all the bits it is sent (e.g. you extend it to support
> > commits without an author, tags without a tagger, signed objects, any
> > other extended commit headers), and even if you add flags to
> > fast-export to die if there are any bits it doesn't recognize and to
> > export all pieces of blobs/trees/tags (e.g. don't add missing authors,
> > don't re-encode messages in UTF-8, don't use grafts or replace
> > objects, keep extended headers such as signatures, etc.), then it
> > still couldn't possibly work in all cases in general.  For example, if
> > you had a repository with unusual objects made by ancient or broken
> > git versions (such as tree entries in the wrong sort order, or tree
> > entries that recorded modes of 040000 instead of 40000 for trees or
> > something with perms other than 100644 or 100755 for files), then when
> > fast-import goes to recreate these objects using the canonical format
> > they will no longer have the same hash and your commit signatures will
> > get invalidated.  Other git commands will also refuse to create
> > objects with those oddities, even if git accepts ancient objects that
> > have them.
> >
> > So, it's basically impossible to have a "complete representation of
> > all events in a repository" that do what you want except for the
> > *original* binary format.  (But if you really want to see the original
> > binary format, maybe `git cat-file --batch` will be handy to you.)
> >
> > But I think fast-export's --reference-excluded-parents might come in
> > handy for you and let you do what you want.
>
> ...to add to that line of thinking, it's also a completely valid
> technique to just completele rewrite your repository, then (re-)push the
> old signed tags to refs/tags/*.

The repository in question didn't have any signed tags, just a signed commit.

> By default they won't be pulled down as they won't reference commits on
> branches you're fetching, and you can also stick them somewhere else
> than refs/tags/*, e.g. refs/legacy-tags/*.
>
> None of the commit history will be the same, but the content (mostly)
> will, which is usually what matters when checking out an old tag.
>
> Of course this hack has little benefit over just keeping a foo-old.git
> repo around, and moving on with new history in your new foo.git.

  reply	other threads:[~2021-03-01 20:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-27 12:31 Round-tripping fast-export/import changes commit hashes anatoly techtonik
2021-02-27 17:48 ` Elijah Newren
2021-02-28 10:00   ` anatoly techtonik
2021-02-28 10:34     ` Ævar Arnfjörð Bjarmason
2021-03-01  7:44       ` anatoly techtonik
2021-03-01 17:34         ` Junio C Hamano
2021-03-02 21:52           ` anatoly techtonik
2021-03-03  7:13             ` Johannes Sixt
2021-03-04  0:55               ` Junio C Hamano
2021-08-09 15:45                 ` anatoly techtonik
2021-08-09 18:15                   ` Elijah Newren
2021-08-10 15:51                     ` anatoly techtonik
2021-08-10 17:57                       ` Elijah Newren
2022-12-11 18:30                         ` anatoly techtonik
2023-01-13  7:21                           ` Elijah Newren
2021-03-01 18:06         ` Elijah Newren
2021-03-01 20:04           ` Ævar Arnfjörð Bjarmason
2021-03-01 20:17             ` Elijah Newren [this message]
2021-03-02 22:12           ` anatoly techtonik
2021-03-01 20:02         ` Ævar Arnfjörð Bjarmason
2021-03-02 22:23           ` anatoly techtonik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BHdtAKz_V2RhBqevMy+Hy_rHtQ7Y2chggpt1rZ9nRn8Zw@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=techtonik@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).