* Round-tripping fast-export/import changes commit hashes @ 2021-02-27 12:31 anatoly techtonik 2021-02-27 17:48 ` Elijah Newren 0 siblings, 1 reply; 21+ messages in thread From: anatoly techtonik @ 2021-02-27 12:31 UTC (permalink / raw) To: Git Mailing List Hi. I can't get the same commit hashes after fast-export and then fast-import of this repository without any edits https://github.com/simons-public/protonfixes I have no idea what causes this, and how to prevent it from happening. Are there any workarounds? What did you do before the bug happened? (Steps to reproduce your issue) #!/bin/bash git clone https://github.com/simons-public/protonfixes.git git -C protonfixes log --format=oneline | tail -n 4 git init protoimported git -C protonfixes fast-export --all --reencode=no | (cd protoimported && git fast-import) git -C protoimported log --format=oneline | tail -n 4 What did you expect to happen? (Expected behavior) Expect imported repo to match exported. What happened instead? (Actual behavior) All hashes are different, the exported repo diverged on the second commit. What's different between what you expected and what actually happened? The log of hashes from initial repo: + git -C protonfixes log --format=oneline + tail -n 4 1c0cf2c8e742e673dba9fd1a09afd12a25c25571 Update README.md 367d61f9b2a799accbdaeed5d64f9be914ca0f7a Updated zip link d3d24b63446c7d06586eaa51764ff0c619113f09 Update README.md 7a43ca89ff7a70127ac9ca0f10b6eaaa34f2f69c Initial commit The log from imported repo: + git -C protoimported log --format=oneline + tail -n 4 a27ec5d2e4c562f40e693e0b4149959d2b69bf21 Update README.md e59cf92be79c47984e9f94bfad912e5a29dfa5e0 Updated zip link fb6498f62af783d2e943770f90bc642cf5c9ec9c Update README.md 7a43ca89ff7a70127ac9ca0f10b6eaaa34f2f69c Initial commit [System Info] git version: git version 2.31.0.rc0 cpu: x86_64 built from commit: 225365fb5195e804274ab569ac3cc4919451dc7f sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh uname: Linux 5.8.0-43-generic #49-Ubuntu SMP Fri Feb 5 03:01:28 UTC 2021 x86_64 compiler info: gnuc: 10.2 libc info: glibc: 2.32 $SHELL (typically, interactive shell): /usr/bin/zsh [Enabled Hooks] not run from a git repository - no hooks to show -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-02-27 12:31 Round-tripping fast-export/import changes commit hashes anatoly techtonik @ 2021-02-27 17:48 ` Elijah Newren 2021-02-28 10:00 ` anatoly techtonik 0 siblings, 1 reply; 21+ messages in thread From: Elijah Newren @ 2021-02-27 17:48 UTC (permalink / raw) To: anatoly techtonik; +Cc: Git Mailing List Hi, On Sat, Feb 27, 2021 at 4:37 AM anatoly techtonik <techtonik@gmail.com> wrote: > > Hi. > > I can't get the same commit hashes after fast-export and then fast-import of > this repository without any edits https://github.com/simons-public/protonfixes > I have no idea what causes this, and how to prevent it from happening. Are > there any workarounds? Your second commit is signed. Fast-export strips any extended headers on commits, such as GPG signatures, because there's no way to keep them in general. In the special case that you aren't making *any* changes to the repository and will import it as-is, you could theoretically keep the signatures, but you don't need fast-export in such a case so no one ever bothered to implement commit signature handling in fast-export and fast-import. If you make any changes whatsoever to the commits before the signature (including importing them to a different system), then the signature would be invalid. You probably don't want to hear this, but there are no workarounds. There are also other things that will prevent a simple fast-export | fast-import pipeline from preserving your history as-is besides signed commits (most of these are noted in the "Inherited Limitations" section over at https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html): * any other form of extended header; fast-export only looks for the headers it knows and exports those * grafts and replace objects will just get rewritten (and if they cause any cycles, those cycles and anything depending on them are dropped) * commits without an author will be given one matching the committer (hopefully you don't have these, but if you do...) * tags that are missing a tagger are also a problem (hopefully you don't have these, but if you do...) * annotated or signed tags outside the refs/tags/ namespace will get renamed weirdly * commits by default are re-encoded into UTF-8, though I notice you did pass --reencode=no to handle this Hope that at least explains things for you, even if it doesn't give you a workaround or a solution. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-02-27 17:48 ` Elijah Newren @ 2021-02-28 10:00 ` anatoly techtonik 2021-02-28 10:34 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 21+ messages in thread From: anatoly techtonik @ 2021-02-28 10:00 UTC (permalink / raw) To: Elijah Newren; +Cc: Git Mailing List On Sat, Feb 27, 2021 at 8:49 PM Elijah Newren <newren@gmail.com> wrote: > > Your second commit is signed. Fast-export strips any extended headers > on commits, such as GPG signatures, because there's no way to keep > them in general. Why is it not possible to encode them with base64 and insert into the stream? > There are also other things that will prevent a simple fast-export | > fast-import pipeline from preserving your history as-is besides signed > commits (most of these are noted in the "Inherited Limitations" > section over at > https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html): Is there any way to check what commits will be altered as a result of `fast-export` and why? Right now I don't see that it is reported. > Hope that at least explains things for you, even if it doesn't give > you a workaround or a solution. Thanks. That is very helpful to know. The reason I am asking is because I tried to merge two repos with `reposurgeon` which operates on `fast-export` data. It is basically merging GitHub wiki into main repo, After successfully merging them I still can not send a PR, because it produces a huge amount of changes, because of the stripped info. It can be seen here: https://github.com/simons-public/protonfixes/compare/master...techtonik:master I tracked this behaviour in `reposurgeon` in this issue https://gitlab.com/esr/reposurgeon/-/issues/344 -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-02-28 10:00 ` anatoly techtonik @ 2021-02-28 10:34 ` Ævar Arnfjörð Bjarmason 2021-03-01 7:44 ` anatoly techtonik 0 siblings, 1 reply; 21+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-28 10:34 UTC (permalink / raw) To: anatoly techtonik; +Cc: Elijah Newren, Git Mailing List On Sun, Feb 28 2021, anatoly techtonik wrote: > On Sat, Feb 27, 2021 at 8:49 PM Elijah Newren <newren@gmail.com> wrote: >> >> Your second commit is signed. Fast-export strips any extended headers >> on commits, such as GPG signatures, because there's no way to keep >> them in general. > > Why is it not possible to encode them with base64 and insert into the > stream? I think Elijah means that in the general case people are using fast export/import to export/import between different systems or in combination with a utility like git-filter-repo. In those cases users are also changing the content of the repository, so the hashes will change, invalidating signatures. But there's also cases where e.g. you don't modify the history, or only part of it, and could then preserve these headers. I think there's no inherent reason not to do so, just that nobody's cared enough to submit patches etc. >> There are also other things that will prevent a simple fast-export | >> fast-import pipeline from preserving your history as-is besides signed >> commits (most of these are noted in the "Inherited Limitations" >> section over at >> https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html): > > Is there any way to check what commits will be altered as a result of > `fast-export` and why? Right now I don't see that it is reported. I don't think so, but not being very familiar with fast export/import I don't see why it shouldn't have some option to not munge data like that, or to report it, if someone cared enough to track those issues & patch it... >> Hope that at least explains things for you, even if it doesn't give >> you a workaround or a solution. > > Thanks. That is very helpful to know. > > The reason I am asking is because I tried to merge two repos with > `reposurgeon` which operates on `fast-export` data. It is basically > merging GitHub wiki into main repo, > > After successfully merging them I still can not send a PR, because > it produces a huge amount of changes, because of the stripped info. > It can be seen here: > > https://github.com/simons-public/protonfixes/compare/master...techtonik:master > > I tracked this behaviour in `reposurgeon` in this issue > https://gitlab.com/esr/reposurgeon/-/issues/344 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-02-28 10:34 ` Ævar Arnfjörð Bjarmason @ 2021-03-01 7:44 ` anatoly techtonik 2021-03-01 17:34 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: anatoly techtonik @ 2021-03-01 7:44 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Elijah Newren, Git Mailing List On Sun, Feb 28, 2021 at 1:34 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > I think Elijah means that in the general case people are using fast > export/import to export/import between different systems or in > combination with a utility like git-filter-repo. > > In those cases users are also changing the content of the repository, so > the hashes will change, invalidating signatures. > > But there's also cases where e.g. you don't modify the history, or only > part of it, and could then preserve these headers. I think there's no > inherent reason not to do so, just that nobody's cared enough to submit > patches etc. Is fast-export/import the only way to filter information in `git`? Maybe there is a slow json-export/import tool that gives a complete representation of all events in a repository? Or API that can be used to serialize and import that stream? If no, then I'd like to take a look at where header filtering and serialization takes place. My C skills are at the "hello world" level, so I am not sure I can write a patch. But I can write the logic in Python and ask somebody to port that. -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-01 7:44 ` anatoly techtonik @ 2021-03-01 17:34 ` Junio C Hamano 2021-03-02 21:52 ` anatoly techtonik 2021-03-01 18:06 ` Elijah Newren 2021-03-01 20:02 ` Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2021-03-01 17:34 UTC (permalink / raw) To: anatoly techtonik Cc: Ævar Arnfjörð Bjarmason, Elijah Newren, Git Mailing List anatoly techtonik <techtonik@gmail.com> writes: > Is fast-export/import the only way to filter information in `git`? Maybe there > is a slow json-export/import tool that gives a complete representation of all > events in a repository? Or API that can be used to serialize and import that > stream? I do not think representation is a problem. It is just that the output stream of fast-export is designed to be "filtered" and the expected use case is to modify the stream somehow before feeding it to fast-import. And because every object name and commit & tag signature depends on everything that they can reach, even a single bit change in an earlier part of the history will invalidate any and all signatures on objects that can reach it. So instead of originally-signed objects whose signatures are now invalid, "fast-export | fast-import" pipeline would give you originally-signed objects whose signatures are stripped. Admittedly, there is a narrow use case where such a signature invalidation is not an issue. If you run fast-export and feed that straight into fast-import without doing any modification to the stream, then you are getting a bit-for-bit identical copy. But "git clone --mirror" is a much better way to do get such a bit-for-bit identical history and objects. And if you want to do so with sneakernet, you can create a bundle file, sneakernet it to your destination, and then clone from the bundle. So... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-01 17:34 ` Junio C Hamano @ 2021-03-02 21:52 ` anatoly techtonik 2021-03-03 7:13 ` Johannes Sixt 0 siblings, 1 reply; 21+ messages in thread From: anatoly techtonik @ 2021-03-02 21:52 UTC (permalink / raw) To: Junio C Hamano Cc: Ævar Arnfjörð Bjarmason, Elijah Newren, Git Mailing List On Mon, Mar 1, 2021 at 8:34 PM Junio C Hamano <gitster@pobox.com> wrote: > > It is just that the output stream of fast-export is designed to be > "filtered" and the expected use case is to modify the stream somehow > before feeding it to fast-import. And because every object name and > commit & tag signature depends on everything that they can reach, > even a single bit change in an earlier part of the history will > invalidate any and all signatures on objects that can reach it. So > instead of originally-signed objects whose signatures are now > invalid, "fast-export | fast-import" pipeline would give you > originally-signed objects whose signatures are stripped. I need to merge two unrelated repos and I am using `reposurgeon` http://www.catb.org/~esr/reposurgeon/repository-editing.html to do this preserving timestamps and commit order. The model of operation is that it reads revisions into memory from git using fast-export, operates on them, and then rebuild the stream back into git repo with fast-import. The problem is that in the exported dump the information is already lost, and the resulting commits are "not mergeable". Basically all GitHub repositories where people edited `README.md` online are "not mergeable" after this point, because all GitHub edited commits are signed. For my use case, where I just need to attach another branch in time without altering original commits in any way, `reposurgeon` can not be used. > Admittedly, there is a narrow use case where such a signature > invalidation is not an issue. If you run fast-export and feed that > straight into fast-import without doing any modification to the > stream, then you are getting a bit-for-bit identical copy. I did just that and signatures got stripped, altering history. git -C protonfixes fast-export --all --reencode=no | (cd protoimported && git fast-import) -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-02 21:52 ` anatoly techtonik @ 2021-03-03 7:13 ` Johannes Sixt 2021-03-04 0:55 ` Junio C Hamano 0 siblings, 1 reply; 21+ messages in thread From: Johannes Sixt @ 2021-03-03 7:13 UTC (permalink / raw) To: anatoly techtonik Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Elijah Newren, Git Mailing List Am 02.03.21 um 22:52 schrieb anatoly techtonik: > For my use case, where I just need to attach another branch in > time without altering original commits in any way, `reposurgeon` > can not be used. What do you mean by "attach another branch in time"? Because if you really do not want to alter original commits in any way, perhaps you only want `git fetch /the/other/repository master:the-other-one-s-master`? -- Hannes ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-03 7:13 ` Johannes Sixt @ 2021-03-04 0:55 ` Junio C Hamano 2021-08-09 15:45 ` anatoly techtonik 0 siblings, 1 reply; 21+ messages in thread From: Junio C Hamano @ 2021-03-04 0:55 UTC (permalink / raw) To: Johannes Sixt Cc: anatoly techtonik, Ævar Arnfjörð Bjarmason, Elijah Newren, Git Mailing List Johannes Sixt <j6t@kdbg.org> writes: > Am 02.03.21 um 22:52 schrieb anatoly techtonik: >> For my use case, where I just need to attach another branch in >> time without altering original commits in any way, `reposurgeon` >> can not be used. > > What do you mean by "attach another branch in time"? Because if you > really do not want to alter original commits in any way, perhaps you > only want `git fetch /the/other/repository master:the-other-one-s-master`? Yeah, I had the same impression. If a bit-for-bit identical copy of the original history is needed, then fetching from the original repository (either directly or via a bundle) would be a much simpler and performant way. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-04 0:55 ` Junio C Hamano @ 2021-08-09 15:45 ` anatoly techtonik 2021-08-09 18:15 ` Elijah Newren 0 siblings, 1 reply; 21+ messages in thread From: anatoly techtonik @ 2021-08-09 15:45 UTC (permalink / raw) To: Junio C Hamano Cc: Johannes Sixt, Ævar Arnfjörð Bjarmason, Elijah Newren, Git Mailing List On Thu, Mar 4, 2021 at 3:56 AM Junio C Hamano <gitster@pobox.com> wrote: > > Johannes Sixt <j6t@kdbg.org> writes: > > > Am 02.03.21 um 22:52 schrieb anatoly techtonik: > >> For my use case, where I just need to attach another branch in > >> time without altering original commits in any way, `reposurgeon` > >> can not be used. > > > > What do you mean by "attach another branch in time"? Because if you > > really do not want to alter original commits in any way, perhaps you > > only want `git fetch /the/other/repository master:the-other-one-s-master`? > > Yeah, I had the same impression. If a bit-for-bit identical copy of > the original history is needed, then fetching from the original > repository (either directly or via a bundle) would be a much simpler > and performant way. The goal is to have an editable stream, which, if left without edits, would be bit-by-bit identical, so that external tools like `reposurgeon` could operate on that stream and be audited. Right now, because the repository https://github.com/simons-public/protonfixes contains a signed commit right from the start, the simple fast-export and fast-import with git itself fails the check. I understand that patching `git` to add `--complete` to fast-import is realistically beyond my coding abilities, and my only option is to parse the binary stream produced by `git cat-file --batch`, which I also won't be able to do without specification. P.S. I am resurrecting the old thread, because my problem with editing the history of the repository with an external tool still can not be solved. -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-08-09 15:45 ` anatoly techtonik @ 2021-08-09 18:15 ` Elijah Newren 2021-08-10 15:51 ` anatoly techtonik 0 siblings, 1 reply; 21+ messages in thread From: Elijah Newren @ 2021-08-09 18:15 UTC (permalink / raw) To: anatoly techtonik Cc: Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Git Mailing List On Mon, Aug 9, 2021 at 8:45 AM anatoly techtonik <techtonik@gmail.com> wrote: > > On Thu, Mar 4, 2021 at 3:56 AM Junio C Hamano <gitster@pobox.com> wrote: > > > > Johannes Sixt <j6t@kdbg.org> writes: > > > > > Am 02.03.21 um 22:52 schrieb anatoly techtonik: > > >> For my use case, where I just need to attach another branch in > > >> time without altering original commits in any way, `reposurgeon` > > >> can not be used. > > > > > > What do you mean by "attach another branch in time"? Because if you > > > really do not want to alter original commits in any way, perhaps you > > > only want `git fetch /the/other/repository master:the-other-one-s-master`? > > > > Yeah, I had the same impression. If a bit-for-bit identical copy of > > the original history is needed, then fetching from the original > > repository (either directly or via a bundle) would be a much simpler > > and performant way. > > The goal is to have an editable stream, which, if left without edits, would > be bit-by-bit identical, so that external tools like `reposurgeon` could > operate on that stream and be audited. There were some patches proposed some months back[1] to make fast-import allow importing signed commits...except that they unconditionally kept the signatures and didn't do any validation, which would have resulted in invalid signatures if any edits happened. I suggested adding signature verification (which would allow options like erroring out if they didn't match, or dropping signatures when they didn't match but keeping them otherwise). That'd help usecases like yours. The author wasn't interested in implementing that suggestion (and it's a low priority for me that I may never get around to). The series also wasn't pushed through and eventually was dropped. However, that wouldn't fully solve your stated goal. As already mentioned earlier in this thread, I don't think your stated goal is realistic; the only complete bit-for-bit identical representation of the repository is the original binary format. Your stated goal here, however, isn't required for solving the usecase you present. [1] https://lore.kernel.org/git/20210430232537.1131641-1-lukeshu@lukeshu.com/ > Right now, because the repository > https://github.com/simons-public/protonfixes contains a signed commit > right from the start, the simple fast-export and fast-import with git itself > fails the check. Yes, and I mentioned several other reasons why a round-trip from fast-export through fast-import cannot be relied upon to preserve object hashes. > I understand that patching `git` to add `--complete` to fast-import is > realistically beyond my coding abilities, and my only option is to parse It's more patching than that which would be required: (1) It'd be both fast-export and fast-import that would need patching, not just fast-import. (2) --complete is a bit of a misnomer too, because it's not just get-all-the-data, it's keep-the-data-in-the-original-format. If objects had modes of 040000 instead of 40000, despite meaning the same thing, you'd have to prevent canonicalization and store them as the original recorded value or you'd get a different hash. Ditto for commit messages with extra data after a NUL byte, and a variety of other possible issues. (3) fast-export works by looking for the relevant bits it knows how to export. You'd have to redesign it to fully parse every bit of data in each object it looks at, throw errors if it didn't recognize any, and make sure it exports all the bits. That might be difficult since it's hard to know how to future proof it. How do you guarantee you've printed every field in a commit struct, when that struct might gain new fields in the future? (This is especially challenging since fast-export/fast-import might not be considered core tools, or at least don't get as much attention as the "truly core" parts of git; see https://lore.kernel.org/git/xmqq36mxdnpz.fsf@gitster-ct.c.googlers.com/) > the binary stream produced by `git cat-file --batch`, which I also won't > be able to do without specification. The specification is already available in the manual. Just run `git cat-file --help` to see it. Let me quote part of it for you: For example, --batch without a custom format would produce: <sha1> SP <type> SP <size> LF <contents> LF > P.S. I am resurrecting the old thread, because my problem with editing > the history of the repository with an external tool still can not be solved. Sure it can, just use fast-export's --reference-excluded-parents option and don't export commits you know you won't need to change. Or, if for some reason you are really set on exporting everything and then editing, then go ahead and create the full fast-export output, including with all your edits, and then post-process it manually before feeding to fast-import. In particular, in the post-processing step find the commits that were problematic that you know won't be modified, such as your signed commit. Then go edit that fast-export dump and (a) remove the dump of the no-longer-signed signed commit (because you don't want it), and (b) replace any references to the no-longer-signed-commit (e.g. "from :12") to instead use the hash of the actual original signed commit (e.g. "from d3d24b63446c7d06586eaa51764ff0c619113f09"). If you do that, then git fast-import will just build the new commits on the existing signed commit instead of on some new commit that is missing the signature. Technically, you can even skip step (a), as all it will do is produce an extra commit in your repository that isn't used and thus will be garbage collected later. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-08-09 18:15 ` Elijah Newren @ 2021-08-10 15:51 ` anatoly techtonik 2021-08-10 17:57 ` Elijah Newren 0 siblings, 1 reply; 21+ messages in thread From: anatoly techtonik @ 2021-08-10 15:51 UTC (permalink / raw) To: Elijah Newren Cc: Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Git Mailing List On Mon, Aug 9, 2021 at 9:15 PM Elijah Newren <newren@gmail.com> wrote: > > The author wasn't interested in implementing that > suggestion (and it's a low priority for me that I may never get around > to). The series also wasn't pushed through and eventually was > dropped. What it takes to validate the commit signature? Isn't it the same as validating commit tag? Is it possible to merge at least the `--fast-export` part? The effect of roundtrip would be the same, but at least external tools would be able to detect signed commits and warn users. > [1] https://lore.kernel.org/git/20210430232537.1131641-1-lukeshu@lukeshu.com/ > Yes, and I mentioned several other reasons why a round-trip from > fast-export through fast-import cannot be relied upon to preserve > object hashes. Yes, I understand that. What would be the recommended way to detect which commits would change as a result of the round-trip? It will then be possible to warn users in `reposurgeon` `lint` command. > (3) fast-export works by looking for the relevant bits it knows how to > export. You'd have to redesign it to fully parse every bit of data in > each object it looks at, throw errors if it didn't recognize any, and > make sure it exports all the bits. That might be difficult since it's > hard to know how to future proof it. How do you guarantee you've > printed every field in a commit struct, when that struct might gain > new fields in the future? (This is especially challenging since > fast-export/fast-import might not be considered core tools, or at > least don't get as much attention as the "truly core" parts of git; > see https://lore.kernel.org/git/xmqq36mxdnpz.fsf@gitster-ct.c.googlers.com/) Looks like the only way to make it forward compatible is to introduce some kind of versioning and a validation schema like protobuf. Otherwise writing an importer and exporter for each and every thing that may encounter in a git stream may be unrealistic, yes. > > P.S. I am resurrecting the old thread, because my problem with editing > > the history of the repository with an external tool still can not be solved. > > Sure it can, just use fast-export's --reference-excluded-parents > option and don't export commits you know you won't need to change. How does `--reference-excluded-parents` help to read signed commits? `reposurgeon` needs all commits to select those that are needed by different criteria. It is hard to tell which commits are not important without reading and processing them first. > Or, if for some reason you are really set on exporting everything and > then editing, then go ahead and create the full fast-export output, > including with all your edits, and then post-process it manually > before feeding to fast-import. In particular, in the post-processing > step find the commits that were problematic that you know won't be > modified, such as your signed commit. Then go edit that fast-export > dump and (a) remove the dump of the no-longer-signed signed commit > (because you don't want it), and (b) replace any references to the > no-longer-signed-commit (e.g. "from :12") to instead use the hash of > the actual original signed commit (e.g. "from > d3d24b63446c7d06586eaa51764ff0c619113f09"). If you do that, then git > fast-import will just build the new commits on the existing signed > commit instead of on some new commit that is missing the signature. > Technically, you can even skip step (a), as all it will do is produce > an extra commit in your repository that isn't used and thus will be > garbage collected later. The problem is to detect problematic signed commits, because as I understand `fast-export` doesn't give any signs if commits were signed before the export. -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-08-10 15:51 ` anatoly techtonik @ 2021-08-10 17:57 ` Elijah Newren 2022-12-11 18:30 ` anatoly techtonik 0 siblings, 1 reply; 21+ messages in thread From: Elijah Newren @ 2021-08-10 17:57 UTC (permalink / raw) To: anatoly techtonik Cc: Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Git Mailing List On Tue, Aug 10, 2021 at 8:51 AM anatoly techtonik <techtonik@gmail.com> wrote: > > On Mon, Aug 9, 2021 at 9:15 PM Elijah Newren <newren@gmail.com> wrote: > > > > The author wasn't interested in implementing that > > suggestion (and it's a low priority for me that I may never get around > > to). The series also wasn't pushed through and eventually was > > dropped. > > What it takes to validate the commit signature? I'm not familiar with any of the gpg libraries, and don't even have an active gpg key. So, I don't know. Some quick grepping shows that we have gpg-interface.[ch], so we have some functions we can apparently call. > Isn't it the same as validating commit tag? gpg signatures of tags are somewhat different than gpg signatures of commits: * gpg signatures for tags are simply part of the annotated tag message * gpg signatures for commits are stored in a separate commit header, not just as extra text at the end of the commit message This gpg signature handling for tags means that fast-import isn't even aware of whether the tag is signed; it simply sees a commit message and records it. fast-export also would have been unaware and just exported them as-is if someone hadn't written some special parsing for it. fast-import would need to do similar special parsing to become aware of whether the tags are signed or not. For now, fast-import just keeps any tag messages as-is, and thus potentially writes invalid tag signatures. (The only way people have to control this is at the fast-export side with the --signed-tags flag, which gives you the choices of abort, strip, or keep the signatures even though they'll likely be wrong.) If fast-import were to gain knowledge of tag signatures and an ability to validate them, it could offer smarter options like keep-if-valid-and-discard-otherwise. In contrast, the fact that gpg signatures for commits have to be recorded as a separate commit header means they cannot be recorded in fast-import without additional code changes. And both the fast-export and fast-import sides have to be made aware of and specially handle the commit signatures for them to even get propagated, let alone validated. > Is it possible to merge at least the `--fast-export` > part? The effect of roundtrip would be the same, but at least external > tools would be able to detect signed commits and warn users. The fact that it wasn't merged suggests there was some issue raised in feedback that wasn't addressed. I don't remember if that was the case or not, but someone would have to find out, address any remaining issues pointed out by feedback, and champion it through. Personally, I don't like shoving a half solution through and think there needs to be validation on the fast-import side added at the same time, but others may disagree with me. I have plenty of other projects to work on, though, so whoever does the work will more likely be the ones to decide. > > [1] https://lore.kernel.org/git/20210430232537.1131641-1-lukeshu@lukeshu.com/ > > > Yes, and I mentioned several other reasons why a round-trip from > > fast-export through fast-import cannot be relied upon to preserve > > object hashes. > > Yes, I understand that. What would be the recommended way to detect > which commits would change as a result of the round-trip? It will then > be possible to warn users in `reposurgeon` `lint` command. There is no function or command that would check that kind of thing short of doing the round-trip. I provided a list of reasons IDs could change as a starting point in case anyone wanted to try to write a function or command that could check, and to point out that it is a long list and might grow in the future. I think practically, if you're doing a one-shot export (as I originally assumed from your email), that you'd find out and then just manually fix things up by hand. If your goal is writing or changing a general purpose filtering tool, then I'd suggest instead using the alternate technique I outlined in the other thread you started at [2]. [2] https://lore.kernel.org/git/CABPp-BH4dcsW52immJpTjgY5LjaVfKrY9MaUOnKT3byi2tBPpg@mail.gmail.com/ > > (3) fast-export works by looking for the relevant bits it knows how to > > export. You'd have to redesign it to fully parse every bit of data in > > each object it looks at, throw errors if it didn't recognize any, and > > make sure it exports all the bits. That might be difficult since it's > > hard to know how to future proof it. How do you guarantee you've > > printed every field in a commit struct, when that struct might gain > > new fields in the future? (This is especially challenging since > > fast-export/fast-import might not be considered core tools, or at > > least don't get as much attention as the "truly core" parts of git; > > see https://lore.kernel.org/git/xmqq36mxdnpz.fsf@gitster-ct.c.googlers.com/) > > Looks like the only way to make it forward compatible is to introduce > some kind of versioning and a validation schema like protobuf. Otherwise > writing an importer and exporter for each and every thing that may > encounter in a git stream may be unrealistic, yes. > > > > P.S. I am resurrecting the old thread, because my problem with editing > > > the history of the repository with an external tool still can not be solved. > > > > Sure it can, just use fast-export's --reference-excluded-parents > > option and don't export commits you know you won't need to change. > > How does `--reference-excluded-parents` help to read signed commits? It doesn't. I was assuming you were doing a one shot export, namely of the repository you linked to, https://github.com/simons-public/protonfixes, and that you already knew which commits were not going to be changed (because you pointed them out in your email to the list) -- and in fact that it was only a single commit affected, as you mentioned. Armed with that knowledge, you could just export the parts of the repository AFTER that commit, and use --reference-excluded-parents to make sure the fast-export stream built upon them rather than squashing all changes up to that point into the first commit in the stream. > `reposurgeon` needs all commits to select those that are needed by > different criteria. It is hard to tell which commits are not important without > reading and processing them first. Right, so you aren't trying to just handle this one repository, but modify/create a general purpose tool that does so. See my response in the other thread you started, again at [2] above. > > Or, if for some reason you are really set on exporting everything and > > then editing, then go ahead and create the full fast-export output, > > including with all your edits, and then post-process it manually > > before feeding to fast-import. In particular, in the post-processing > > step find the commits that were problematic that you know won't be > > modified, such as your signed commit. Then go edit that fast-export > > dump and (a) remove the dump of the no-longer-signed signed commit > > (because you don't want it), and (b) replace any references to the > > no-longer-signed-commit (e.g. "from :12") to instead use the hash of > > the actual original signed commit (e.g. "from > > d3d24b63446c7d06586eaa51764ff0c619113f09"). If you do that, then git > > fast-import will just build the new commits on the existing signed > > commit instead of on some new commit that is missing the signature. > > Technically, you can even skip step (a), as all it will do is produce > > an extra commit in your repository that isn't used and thus will be > > garbage collected later. > > The problem is to detect problematic signed commits, because as I > understand `fast-export` doesn't give any signs if commits were signed > before the export. Signed commits is just one issue, and you'll have to add special code to handle a bunch of other special cases if you go down this route. I'd rephrase the problem. You want to know when _your tool_ (e.g. reposurgeon since you refer to it multiple times; I'm guessing you're contributing to it?) has not modified a commit or any of its ancestors, and when it hasn't, then _your tool_ should remove that commit from the fast-export stream and replace any references to it by the original commit's object id. I outlined how to do this in [2], referenced above, making use of the --show-original-ids flag to fast-export. If you do that, then for any commits which you haven't modified (including not modifying any of its ancestors), then you'll keep the same commits as-is with no stripping of gpg-signatures or canonicalization of objects, so that you'll have the exact same commit IDs. Further, you can do this today, without any changes to git fast-export or git fast-import. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-08-10 17:57 ` Elijah Newren @ 2022-12-11 18:30 ` anatoly techtonik 2023-01-13 7:21 ` Elijah Newren 0 siblings, 1 reply; 21+ messages in thread From: anatoly techtonik @ 2022-12-11 18:30 UTC (permalink / raw) To: Elijah Newren Cc: Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Git Mailing List On Tue, Aug 10, 2021 at 8:58 PM Elijah Newren <newren@gmail.com> wrote: > > On Tue, Aug 10, 2021 at 8:51 AM anatoly techtonik <techtonik@gmail.com> wrote: > > > > On Mon, Aug 9, 2021 at 9:15 PM Elijah Newren <newren@gmail.com> wrote: > > > > > [2] https://lore.kernel.org/git/CABPp-BH4dcsW52immJpTjgY5LjaVfKrY9MaUOnKT3byi2tBPpg@mail.gmail.com/ > > Signed commits is just one issue, and you'll have to add special code > to handle a bunch of other special cases if you go down this route. > I'd rephrase the problem. You want to know when _your tool_ (e.g. > reposurgeon since you refer to it multiple times; I'm guessing you're > contributing to it?) has not modified a commit or any of its > ancestors, and when it hasn't, then _your tool_ should remove that > commit from the fast-export stream and replace any references to it by > the original commit's object id. I outlined how to do this in [2], > referenced above, making use of the --show-original-ids flag to > fast-export. If you do that, then for any commits which you haven't > modified (including not modifying any of its ancestors), then you'll > keep the same commits as-is with no stripping of gpg-signatures or > canonicalization of objects, so that you'll have the exact same commit > IDs. Further, you can do this today, without any changes to git > fast-export or git fast-import. Took me a while to process the reply. Let's recap. I want to make a roundtrip export/import of https://github.com/simons-public/protonfixes which should get exactly the same repository. # --- fast-export to exported.txt git clone https://github.com/simons-public/protonfixes git -C protonfixes fast-export --all > exported.txt # --- check revision of the repo git -C protonfixes rev-parse HEAD # 681411ba8ceb5d2d790e674eb7a5b98951d426e6 # --- fast-import into new repo git init newrepo git -C newrepo fast-import < exported.txt # --- checking revision of the new repo git -C newrepo rev-parse HEAD # 9888762d7857d9721f0c354e7fc187a199754a4b Hashes don't match. The roundtrip fails. Let's see if --reference-excluded-parents helps. # --- export below produces the same export stream as above git -C protonfixes fast-export --reference-excluded-parents --all > exported_parents.txt Because fast-import/fast-export don't work, you propose to keep the old repo around until it is clear which commits I am going to modify. Then make a new fast-export starting from the first commit I am going to modify with --reference-excluded-parents flag. Is that correct so far? Then given this partial export and old repo, how to init the new repo that fast-import can apply its tail there? What if I have multiple commits that I modify, but I don't know which of their parents was first? And when I touch commits from different branches, how to recreate their parent history intact in one repo? -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2022-12-11 18:30 ` anatoly techtonik @ 2023-01-13 7:21 ` Elijah Newren 0 siblings, 0 replies; 21+ messages in thread From: Elijah Newren @ 2023-01-13 7:21 UTC (permalink / raw) To: anatoly techtonik Cc: Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Git Mailing List On Sun, Dec 11, 2022 at 10:30 AM anatoly techtonik <techtonik@gmail.com> wrote: > > On Tue, Aug 10, 2021 at 8:58 PM Elijah Newren <newren@gmail.com> wrote: > > > > On Tue, Aug 10, 2021 at 8:51 AM anatoly techtonik <techtonik@gmail.com> wrote: > > > > > > On Mon, Aug 9, 2021 at 9:15 PM Elijah Newren <newren@gmail.com> wrote: > > > > > > > > [2] https://lore.kernel.org/git/CABPp-BH4dcsW52immJpTjgY5LjaVfKrY9MaUOnKT3byi2tBPpg@mail.gmail.com/ > > > > Signed commits is just one issue, and you'll have to add special code > > to handle a bunch of other special cases if you go down this route. > > I'd rephrase the problem. You want to know when _your tool_ (e.g. > > reposurgeon since you refer to it multiple times; I'm guessing you're > > contributing to it?) has not modified a commit or any of its > > ancestors, and when it hasn't, then _your tool_ should remove that > > commit from the fast-export stream and replace any references to it by > > the original commit's object id. I outlined how to do this in [2], > > referenced above, making use of the --show-original-ids flag to > > fast-export. If you do that, then for any commits which you haven't > > modified (including not modifying any of its ancestors), then you'll > > keep the same commits as-is with no stripping of gpg-signatures or > > canonicalization of objects, so that you'll have the exact same commit > > IDs. Further, you can do this today, without any changes to git > > fast-export or git fast-import. > > Took me a while to process the reply. Let's recap. > > I want to make a roundtrip export/import of > https://github.com/simons-public/protonfixes which should get exactly > the same repository. As I've stated a few times in the thread, this request of yours is simply impossible for general repositories ([1] contains the best summary of the reasons). For the specific repository in question, the only relevant roadblocker is the presence of a signed commit which happens to be a root commit. That opens the door to some workarounds that could be used with this specific repository. [1] https://lore.kernel.org/git/CABPp-BGDB6jj+Et44D6D22KXprB89dNpyS_AAu3E8vOCtVaW1A@mail.gmail.com/ I provided two workarounds you could try to use for your specific case at [2] and [3], one of which you ask about below. [2] https://lore.kernel.org/git/CABPp-BE=9wzF6_VypoR-uEPHsLWdV7zyE13FOgLK0h8NOcMz3g@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BH4dcsW52immJpTjgY5LjaVfKrY9MaUOnKT3byi2tBPpg@mail.gmail.com/ > # --- fast-export to exported.txt > git clone https://github.com/simons-public/protonfixes > git -C protonfixes fast-export --all > exported.txt > # --- check revision of the repo > git -C protonfixes rev-parse HEAD > # 681411ba8ceb5d2d790e674eb7a5b98951d426e6 > > # --- fast-import into new repo > git init newrepo > git -C newrepo fast-import < exported.txt > # --- checking revision of the new repo > git -C newrepo rev-parse HEAD > # 9888762d7857d9721f0c354e7fc187a199754a4b > > Hashes don't match. The roundtrip fails. As expected, given that one of your commits is signed. > Let's see if --reference-excluded-parents helps. > > # --- export below produces the same export stream as above > git -C protonfixes fast-export --reference-excluded-parents --all > > exported_parents.txt --reference-excluded-parents only has effect if there are excluded parents. You didn't exclude any parents, so obviously adding this flag isn't going to change anything. You should instead first clone/fetch the part of history up to the commits you want to keep intact (e.g. the signed commits), and then run a command like git -C protonfixes fast-export --reference-excluded-parents ^${BASECOMMIT1} ^${BASECOMMIT2} ^${BASECOMMITN} --all >exported_only_newer_history.txt | git -C newrepo fast-import Note that the examples I gave you (e.g. [2] above) all used some excluded references (e.g. "^master~5"). > Because fast-import/fast-export don't work You have not yet identified a bug in either, so I disagree with this comment. >, you propose to keep the old > repo around until it is clear which commits I am going to modify. This statement framing looks really weird to me. You have posed your problem in the form of doing some kind of export/import operation, which is fine. However, in order to do an export operation, you obviously need the repository in order to export it. So why are you calling out that you keep the repo around until you run the fast-export command? Anyway, that aside... I was just saying that (1) signed commits exist as a method to ensure to other users that the commits have not been modified (2) fast-export and fast-import exist to allow you to modify history in some fashion (and are separate steps so people can edit the stream between running the two commands) (3) the above two imply that if you still want users to be able to verify the signed commits, that signed commits should NOT be sent through fast-export and fast-import (4) therefore, if you want the signed commits kept as-is, you should simply fetch the history up to and including those, and only send the remainder of the history through fast-export/fast-import. But I will add here one additional thing: If you're weaving repositories together, that likely changes the parent(s) of some of the commits. Once you change the parent(s) of a commit, that alone changes the commit and invalidates any signature it has. In your case you seem to only have a root commit that is signed, and if you keep that signed commit as a root commit, you can avoid this problem. But, in general, if signed commits are involved in the weaving such that they gain new parents, then what you want to do is simply impossible; you will not be able to keep the signatures in such a case (and the commit ids will change as well). > Then > make a new fast-export starting from the first commit I am going to > modify with --reference-excluded-parents flag. Is that correct so far? You have the basic idea, but you are making things excessively complex with one detail here -- it does not need to start with the first commit you are going to modify; it can start earlier. You can simply export all commits after the one(s) you know you don't want to change. For example, if the history looks like this: A---B---C---D---E---F and commits A and B are the only signed commits (which you want to preserve) and commit D is the first one you are going to modify, you could still run fast-export on "^A ^B F" (i.e. C, D, E, and F in this case) -- that will also include C, but C isn't signed and round-trips without problems, so it doesn't hurt to include it. > Then given this partial export and old repo, how to init the new repo > that fast-import can apply its tail there? Flag the signed commit(s) with a branch or branches of some sort, then fetch just those branches into the new repo. > What if I have multiple commits that I modify, but I don't know which > of their parents was first? I wouldn't bother trying to figure out which one(s) is/are first. (I mean, you could do some revision walking to figure that out, in which case you'd have to fetch more than just the history of the signed commits you want to keep but everything prior to whatever first commit(s) you want to modify.) Instead, I'd just do the easier thing I noted above -- use the signed commits as exclusion markers. > And when I touch commits from different > branches, how to recreate their parent history intact in one repo? Place temporary branches pointing directly to each of the signed commits you want to keep intact (which also implies you are keeping all the history behind those commits intact as well), then run: git -C newrepo fetch PATH_OR_URL_OF_OLD_REPO ${TEMPBRANCH1} ${TEMPBRANCH2} ${TEMPBRANCHN} Then use the earlier suggestion of git -C protonfixes fast-export --reference-excluded-parents ^${TEMPBRANCH1} ^${TEMPBRANCH2} ^${TEMPBRANCHN} --all >exported_only_newer_history.txt | git -C newrepo fast-import to get the remainder of the history exported/imported. I will also add that since you are interested in attempting to round-trip through fast-export/fast-import and still end up with the same hashes (ignoring a few fundamental shortcomings mentioned earlier in this thread that won't always permit this to work), you can at least get closer by adding "--reencode=no" to fast-export (so that it doesn't alter commit messages) and setting core.ignorecase=false for at least the fast-import invocation (so that fast-import doesn't make files which differ in case only clob each other while importing). But, again, that only addresses like two issues out of half a dozen. Again, see the link at [1] earlier in this email. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-01 7:44 ` anatoly techtonik 2021-03-01 17:34 ` Junio C Hamano @ 2021-03-01 18:06 ` Elijah Newren 2021-03-01 20:04 ` Ævar Arnfjörð Bjarmason 2021-03-02 22:12 ` anatoly techtonik 2021-03-01 20:02 ` Ævar Arnfjörð Bjarmason 2 siblings, 2 replies; 21+ messages in thread From: Elijah Newren @ 2021-03-01 18:06 UTC (permalink / raw) To: anatoly techtonik Cc: Ævar Arnfjörð Bjarmason, Git Mailing List On Sun, Feb 28, 2021 at 11:44 PM anatoly techtonik <techtonik@gmail.com> wrote: > > On Sun, Feb 28, 2021 at 1:34 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > > > > I think Elijah means that in the general case people are using fast > > export/import to export/import between different systems or in > > combination with a utility like git-filter-repo. > > > > In those cases users are also changing the content of the repository, so > > the hashes will change, invalidating signatures. > > > > But there's also cases where e.g. you don't modify the history, or only > > part of it, and could then preserve these headers. I think there's no > > inherent reason not to do so, just that nobody's cared enough to submit > > patches etc. > > Is fast-export/import the only way to filter information in `git`? Maybe there > is a slow json-export/import tool that gives a complete representation of all > events in a repository? Or API that can be used to serialize and import that > stream? > > If no, then I'd like to take a look at where header filtering and serialization > takes place. My C skills are at the "hello world" level, so I am not sure I can > write a patch. But I can write the logic in Python and ask somebody to port > that. If you are intent on keeping signatures because you know they are still valid, then you already know you aren't modifying any blobs/trees/commits leading up to those signatures. If that is the case, perhaps you should just avoid exporting the signature or anything it depends on, and just export the stuff after that point. You can do this with fast-export's --reference-excluded-parents option and pass it an exclusion range. For example: git fast-export --reference-excluded-parents ^master~5 --all and then pipe that through fast-import. In general, I think if fast-export or fast-import are lacking features you want, we should add them there, but I don't see how adding signature reading to fast-import and signature exporting to fast-export makes sense in general. Even if you assume fast-import can process all the bits it is sent (e.g. you extend it to support commits without an author, tags without a tagger, signed objects, any other extended commit headers), and even if you add flags to fast-export to die if there are any bits it doesn't recognize and to export all pieces of blobs/trees/tags (e.g. don't add missing authors, don't re-encode messages in UTF-8, don't use grafts or replace objects, keep extended headers such as signatures, etc.), then it still couldn't possibly work in all cases in general. For example, if you had a repository with unusual objects made by ancient or broken git versions (such as tree entries in the wrong sort order, or tree entries that recorded modes of 040000 instead of 40000 for trees or something with perms other than 100644 or 100755 for files), then when fast-import goes to recreate these objects using the canonical format they will no longer have the same hash and your commit signatures will get invalidated. Other git commands will also refuse to create objects with those oddities, even if git accepts ancient objects that have them. So, it's basically impossible to have a "complete representation of all events in a repository" that do what you want except for the *original* binary format. (But if you really want to see the original binary format, maybe `git cat-file --batch` will be handy to you.) But I think fast-export's --reference-excluded-parents might come in handy for you and let you do what you want. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-01 18:06 ` Elijah Newren @ 2021-03-01 20:04 ` Ævar Arnfjörð Bjarmason 2021-03-01 20:17 ` Elijah Newren 2021-03-02 22:12 ` anatoly techtonik 1 sibling, 1 reply; 21+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-01 20:04 UTC (permalink / raw) To: Elijah Newren; +Cc: anatoly techtonik, Git Mailing List On Mon, Mar 01 2021, Elijah Newren wrote: > On Sun, Feb 28, 2021 at 11:44 PM anatoly techtonik <techtonik@gmail.com> wrote: >> >> On Sun, Feb 28, 2021 at 1:34 PM Ævar Arnfjörð Bjarmason >> <avarab@gmail.com> wrote: >> > >> > I think Elijah means that in the general case people are using fast >> > export/import to export/import between different systems or in >> > combination with a utility like git-filter-repo. >> > >> > In those cases users are also changing the content of the repository, so >> > the hashes will change, invalidating signatures. >> > >> > But there's also cases where e.g. you don't modify the history, or only >> > part of it, and could then preserve these headers. I think there's no >> > inherent reason not to do so, just that nobody's cared enough to submit >> > patches etc. >> >> Is fast-export/import the only way to filter information in `git`? Maybe there >> is a slow json-export/import tool that gives a complete representation of all >> events in a repository? Or API that can be used to serialize and import that >> stream? >> >> If no, then I'd like to take a look at where header filtering and serialization >> takes place. My C skills are at the "hello world" level, so I am not sure I can >> write a patch. But I can write the logic in Python and ask somebody to port >> that. > > If you are intent on keeping signatures because you know they are > still valid, then you already know you aren't modifying any > blobs/trees/commits leading up to those signatures. If that is the > case, perhaps you should just avoid exporting the signature or > anything it depends on, and just export the stuff after that point. > You can do this with fast-export's --reference-excluded-parents option > and pass it an exclusion range. For example: > > git fast-export --reference-excluded-parents ^master~5 --all > > and then pipe that through fast-import. > > > In general, I think if fast-export or fast-import are lacking features > you want, we should add them there, but I don't see how adding > signature reading to fast-import and signature exporting to > fast-export makes sense in general. Even if you assume fast-import > can process all the bits it is sent (e.g. you extend it to support > commits without an author, tags without a tagger, signed objects, any > other extended commit headers), and even if you add flags to > fast-export to die if there are any bits it doesn't recognize and to > export all pieces of blobs/trees/tags (e.g. don't add missing authors, > don't re-encode messages in UTF-8, don't use grafts or replace > objects, keep extended headers such as signatures, etc.), then it > still couldn't possibly work in all cases in general. For example, if > you had a repository with unusual objects made by ancient or broken > git versions (such as tree entries in the wrong sort order, or tree > entries that recorded modes of 040000 instead of 40000 for trees or > something with perms other than 100644 or 100755 for files), then when > fast-import goes to recreate these objects using the canonical format > they will no longer have the same hash and your commit signatures will > get invalidated. Other git commands will also refuse to create > objects with those oddities, even if git accepts ancient objects that > have them. > > So, it's basically impossible to have a "complete representation of > all events in a repository" that do what you want except for the > *original* binary format. (But if you really want to see the original > binary format, maybe `git cat-file --batch` will be handy to you.) > > But I think fast-export's --reference-excluded-parents might come in > handy for you and let you do what you want. ...to add to that line of thinking, it's also a completely valid technique to just completele rewrite your repository, then (re-)push the old signed tags to refs/tags/*. By default they won't be pulled down as they won't reference commits on branches you're fetching, and you can also stick them somewhere else than refs/tags/*, e.g. refs/legacy-tags/*. None of the commit history will be the same, but the content (mostly) will, which is usually what matters when checking out an old tag. Of course this hack has little benefit over just keeping a foo-old.git repo around, and moving on with new history in your new foo.git. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-01 20:04 ` Ævar Arnfjörð Bjarmason @ 2021-03-01 20:17 ` Elijah Newren 0 siblings, 0 replies; 21+ messages in thread From: Elijah Newren @ 2021-03-01 20:17 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: anatoly techtonik, Git Mailing List On Mon, Mar 1, 2021 at 12:04 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Mon, Mar 01 2021, Elijah Newren wrote: > > > On Sun, Feb 28, 2021 at 11:44 PM anatoly techtonik <techtonik@gmail.com> wrote: > >> > >> On Sun, Feb 28, 2021 at 1:34 PM Ævar Arnfjörð Bjarmason > >> <avarab@gmail.com> wrote: > >> > > >> > I think Elijah means that in the general case people are using fast > >> > export/import to export/import between different systems or in > >> > combination with a utility like git-filter-repo. > >> > > >> > In those cases users are also changing the content of the repository, so > >> > the hashes will change, invalidating signatures. > >> > > >> > But there's also cases where e.g. you don't modify the history, or only > >> > part of it, and could then preserve these headers. I think there's no > >> > inherent reason not to do so, just that nobody's cared enough to submit > >> > patches etc. > >> > >> Is fast-export/import the only way to filter information in `git`? Maybe there > >> is a slow json-export/import tool that gives a complete representation of all > >> events in a repository? Or API that can be used to serialize and import that > >> stream? > >> > >> If no, then I'd like to take a look at where header filtering and serialization > >> takes place. My C skills are at the "hello world" level, so I am not sure I can > >> write a patch. But I can write the logic in Python and ask somebody to port > >> that. > > > > If you are intent on keeping signatures because you know they are > > still valid, then you already know you aren't modifying any > > blobs/trees/commits leading up to those signatures. If that is the > > case, perhaps you should just avoid exporting the signature or > > anything it depends on, and just export the stuff after that point. > > You can do this with fast-export's --reference-excluded-parents option > > and pass it an exclusion range. For example: > > > > git fast-export --reference-excluded-parents ^master~5 --all > > > > and then pipe that through fast-import. > > > > > > In general, I think if fast-export or fast-import are lacking features > > you want, we should add them there, but I don't see how adding > > signature reading to fast-import and signature exporting to > > fast-export makes sense in general. Even if you assume fast-import > > can process all the bits it is sent (e.g. you extend it to support > > commits without an author, tags without a tagger, signed objects, any > > other extended commit headers), and even if you add flags to > > fast-export to die if there are any bits it doesn't recognize and to > > export all pieces of blobs/trees/tags (e.g. don't add missing authors, > > don't re-encode messages in UTF-8, don't use grafts or replace > > objects, keep extended headers such as signatures, etc.), then it > > still couldn't possibly work in all cases in general. For example, if > > you had a repository with unusual objects made by ancient or broken > > git versions (such as tree entries in the wrong sort order, or tree > > entries that recorded modes of 040000 instead of 40000 for trees or > > something with perms other than 100644 or 100755 for files), then when > > fast-import goes to recreate these objects using the canonical format > > they will no longer have the same hash and your commit signatures will > > get invalidated. Other git commands will also refuse to create > > objects with those oddities, even if git accepts ancient objects that > > have them. > > > > So, it's basically impossible to have a "complete representation of > > all events in a repository" that do what you want except for the > > *original* binary format. (But if you really want to see the original > > binary format, maybe `git cat-file --batch` will be handy to you.) > > > > But I think fast-export's --reference-excluded-parents might come in > > handy for you and let you do what you want. > > ...to add to that line of thinking, it's also a completely valid > technique to just completele rewrite your repository, then (re-)push the > old signed tags to refs/tags/*. The repository in question didn't have any signed tags, just a signed commit. > By default they won't be pulled down as they won't reference commits on > branches you're fetching, and you can also stick them somewhere else > than refs/tags/*, e.g. refs/legacy-tags/*. > > None of the commit history will be the same, but the content (mostly) > will, which is usually what matters when checking out an old tag. > > Of course this hack has little benefit over just keeping a foo-old.git > repo around, and moving on with new history in your new foo.git. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-01 18:06 ` Elijah Newren 2021-03-01 20:04 ` Ævar Arnfjörð Bjarmason @ 2021-03-02 22:12 ` anatoly techtonik 1 sibling, 0 replies; 21+ messages in thread From: anatoly techtonik @ 2021-03-02 22:12 UTC (permalink / raw) To: Elijah Newren; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List On Mon, Mar 1, 2021 at 9:06 PM Elijah Newren <newren@gmail.com> wrote: > On Sun, Feb 28, 2021 at 11:44 PM anatoly techtonik <techtonik@gmail.com> wrote: > For example: > > git fast-export --reference-excluded-parents ^master~5 --all > > and then pipe that through fast-import. That may come in handy, but if certain parents are excluded, it will be impossible to find them to reference and attach branches to them. > Other git commands will also refuse to create > objects with those oddities, even if git accepts ancient objects that > have them. Are there any `lint` commands that can detect and warn about those oddities? > (But if you really want to see the original > binary format, maybe `git cat-file --batch` will be handy to you.) Looks good. Is there a way to import it back? And how hard it could be to write a parser for it? Is there a specification for its fields? -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-01 7:44 ` anatoly techtonik 2021-03-01 17:34 ` Junio C Hamano 2021-03-01 18:06 ` Elijah Newren @ 2021-03-01 20:02 ` Ævar Arnfjörð Bjarmason 2021-03-02 22:23 ` anatoly techtonik 2 siblings, 1 reply; 21+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-01 20:02 UTC (permalink / raw) To: anatoly techtonik; +Cc: Elijah Newren, Git Mailing List On Mon, Mar 01 2021, anatoly techtonik wrote: > On Sun, Feb 28, 2021 at 1:34 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> I think Elijah means that in the general case people are using fast >> export/import to export/import between different systems or in >> combination with a utility like git-filter-repo. >> >> In those cases users are also changing the content of the repository, so >> the hashes will change, invalidating signatures. >> >> But there's also cases where e.g. you don't modify the history, or only >> part of it, and could then preserve these headers. I think there's no >> inherent reason not to do so, just that nobody's cared enough to submit >> patches etc. > > Is fast-export/import the only way to filter information in `git`? Maybe there > is a slow json-export/import tool that gives a complete representation of all > events in a repository? Or API that can be used to serialize and import that > stream? Aside from other things mentioned & any issues in fast export/import in this thread, if you want round-trip correctness you're not going to want JSON-anything. It's not capable of representing arbitrary binary data. But in any case, it's not the fast-export format that's the issue, but how the tools in git.git are munging/rewriting/omitting the repository data in question... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Round-tripping fast-export/import changes commit hashes 2021-03-01 20:02 ` Ævar Arnfjörð Bjarmason @ 2021-03-02 22:23 ` anatoly techtonik 0 siblings, 0 replies; 21+ messages in thread From: anatoly techtonik @ 2021-03-02 22:23 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Elijah Newren, Git Mailing List On Mon, Mar 1, 2021 at 11:02 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > Aside from other things mentioned & any issues in fast export/import in > this thread, if you want round-trip correctness you're not going to want > JSON-anything. It's not capable of representing arbitrary binary data. Yes, binary data needs to be explicitly represented in base64 or similar encoding. Just the same way as ordinary strings will need escape symbols. -- anatoly t. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-01-13 7:27 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-27 12:31 Round-tripping fast-export/import changes commit hashes anatoly techtonik 2021-02-27 17:48 ` Elijah Newren 2021-02-28 10:00 ` anatoly techtonik 2021-02-28 10:34 ` Ævar Arnfjörð Bjarmason 2021-03-01 7:44 ` anatoly techtonik 2021-03-01 17:34 ` Junio C Hamano 2021-03-02 21:52 ` anatoly techtonik 2021-03-03 7:13 ` Johannes Sixt 2021-03-04 0:55 ` Junio C Hamano 2021-08-09 15:45 ` anatoly techtonik 2021-08-09 18:15 ` Elijah Newren 2021-08-10 15:51 ` anatoly techtonik 2021-08-10 17:57 ` Elijah Newren 2022-12-11 18:30 ` anatoly techtonik 2023-01-13 7:21 ` Elijah Newren 2021-03-01 18:06 ` Elijah Newren 2021-03-01 20:04 ` Ævar Arnfjörð Bjarmason 2021-03-01 20:17 ` Elijah Newren 2021-03-02 22:12 ` anatoly techtonik 2021-03-01 20:02 ` Ævar Arnfjörð Bjarmason 2021-03-02 22:23 ` anatoly techtonik
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).