git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: anatoly techtonik <techtonik@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: Round-tripping fast-export/import changes commit hashes
Date: Mon, 1 Mar 2021 10:06:43 -0800	[thread overview]
Message-ID: <CABPp-BE=9wzF6_VypoR-uEPHsLWdV7zyE13FOgLK0h8NOcMz3g@mail.gmail.com> (raw)
In-Reply-To: <CAPkN8xLE68d5Ngpy+LOQ8SALNgfB-+q4F3mFK-QBD=+EOKZSVg@mail.gmail.com>

On Sun, Feb 28, 2021 at 11:44 PM anatoly techtonik <techtonik@gmail.com> wrote:
>
> On Sun, Feb 28, 2021 at 1:34 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> >
> > I think Elijah means that in the general case people are using fast
> > export/import to export/import between different systems or in
> > combination with a utility like git-filter-repo.
> >
> > In those cases users are also changing the content of the repository, so
> > the hashes will change, invalidating signatures.
> >
> > But there's also cases where e.g. you don't modify the history, or only
> > part of it, and could then preserve these headers. I think there's no
> > inherent reason not to do so, just that nobody's cared enough to submit
> > patches etc.
>
> Is fast-export/import the only way to filter information in `git`? Maybe there
> is a slow json-export/import tool that gives a complete representation of all
> events in a repository? Or API that can be used to serialize and import that
> stream?
>
> If no, then I'd like to take a look at where header filtering and serialization
> takes place. My C skills are at the "hello world" level, so I am not sure I can
> write a patch. But I can write the logic in Python and ask somebody to port
> that.

If you are intent on keeping signatures because you know they are
still valid, then you already know you aren't modifying any
blobs/trees/commits leading up to those signatures.  If that is the
case, perhaps you should just avoid exporting the signature or
anything it depends on, and just export the stuff after that point.
You can do this with fast-export's --reference-excluded-parents option
and pass it an exclusion range.  For example:

   git fast-export --reference-excluded-parents ^master~5 --all

and then pipe that through fast-import.


In general, I think if fast-export or fast-import are lacking features
you want, we should add them there, but I don't see how adding
signature reading to fast-import and signature exporting to
fast-export makes sense in general.  Even if you assume fast-import
can process all the bits it is sent (e.g. you extend it to support
commits without an author, tags without a tagger, signed objects, any
other extended commit headers), and even if you add flags to
fast-export to die if there are any bits it doesn't recognize and to
export all pieces of blobs/trees/tags (e.g. don't add missing authors,
don't re-encode messages in UTF-8, don't use grafts or replace
objects, keep extended headers such as signatures, etc.), then it
still couldn't possibly work in all cases in general.  For example, if
you had a repository with unusual objects made by ancient or broken
git versions (such as tree entries in the wrong sort order, or tree
entries that recorded modes of 040000 instead of 40000 for trees or
something with perms other than 100644 or 100755 for files), then when
fast-import goes to recreate these objects using the canonical format
they will no longer have the same hash and your commit signatures will
get invalidated.  Other git commands will also refuse to create
objects with those oddities, even if git accepts ancient objects that
have them.

So, it's basically impossible to have a "complete representation of
all events in a repository" that do what you want except for the
*original* binary format.  (But if you really want to see the original
binary format, maybe `git cat-file --batch` will be handy to you.)

But I think fast-export's --reference-excluded-parents might come in
handy for you and let you do what you want.

  parent reply	other threads:[~2021-03-01 18:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-27 12:31 Round-tripping fast-export/import changes commit hashes anatoly techtonik
2021-02-27 17:48 ` Elijah Newren
2021-02-28 10:00   ` anatoly techtonik
2021-02-28 10:34     ` Ævar Arnfjörð Bjarmason
2021-03-01  7:44       ` anatoly techtonik
2021-03-01 17:34         ` Junio C Hamano
2021-03-02 21:52           ` anatoly techtonik
2021-03-03  7:13             ` Johannes Sixt
2021-03-04  0:55               ` Junio C Hamano
2021-08-09 15:45                 ` anatoly techtonik
2021-08-09 18:15                   ` Elijah Newren
2021-08-10 15:51                     ` anatoly techtonik
2021-08-10 17:57                       ` Elijah Newren
2022-12-11 18:30                         ` anatoly techtonik
2023-01-13  7:21                           ` Elijah Newren
2021-03-01 18:06         ` Elijah Newren [this message]
2021-03-01 20:04           ` Ævar Arnfjörð Bjarmason
2021-03-01 20:17             ` Elijah Newren
2021-03-02 22:12           ` anatoly techtonik
2021-03-01 20:02         ` Ævar Arnfjörð Bjarmason
2021-03-02 22:23           ` anatoly techtonik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BE=9wzF6_VypoR-uEPHsLWdV7zyE13FOgLK0h8NOcMz3g@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=techtonik@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).