* Fetch-hooks @ 2018-02-07 21:56 Leo Gaspard 2018-02-07 22:51 ` Fetch-hooks Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-07 21:56 UTC (permalink / raw) To: git Hello, tl;dr: Is there currently a way to have fetch hooks, and if not do you think it could be a nice feature? I was in the process of implementing hooks for git that ensure the repository is always cleanly signed by someone allowed to by the repository itself. I think I've completed the signature-checking part [1] and the push hook [2] (even though it isn't really configurable at the moment). However, I was starting to think about handling the fetch step, and couldn't find any fetch hook. Is there one? If not, would you think it is would be a good idea to add one, that would eg. be passed the commit-before, commit-after and could block the changing of the reference if it failed? The only other solution I could think of is using a separate script for fetching, but that would be fragile, as the user could always not think about it well and run a git fetch, breaking the objective that after the first clone all commits were correctly signature-checked. Thanks for reading me! Leo PS1: I am not subscribed to the ML. PS2: I've tried asking freenode#git, without success so far. [1] https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh [2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-07 21:56 Fetch-hooks Leo Gaspard @ 2018-02-07 22:51 ` Ævar Arnfjörð Bjarmason 2018-02-08 0:06 ` Fetch-hooks Leo Gaspard 0 siblings, 1 reply; 38+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-02-07 22:51 UTC (permalink / raw) To: Leo Gaspard; +Cc: git On Wed, Feb 07 2018, Leo Gaspard jotted: > Hello, > > tl;dr: Is there currently a way to have fetch hooks, and if not do you > think it could be a nice feature? > > I was in the process of implementing hooks for git that ensure the > repository is always cleanly signed by someone allowed to by the > repository itself. I think I've completed the signature-checking part > [1] and the push hook [2] (even though it isn't really configurable at > the moment). > > However, I was starting to think about handling the fetch step, and > couldn't find any fetch hook. Is there one? > > If not, would you think it is would be a good idea to add one, that > would eg. be passed the commit-before, commit-after and could block the > changing of the reference if it failed? > > The only other solution I could think of is using a separate script for > fetching, but that would be fragile, as the user could always not think > about it well and run a git fetch, breaking the objective that after the > first clone all commits were correctly signature-checked. > > Thanks for reading me! > Leo > > PS1: I am not subscribed to the ML. > > PS2: I've tried asking freenode#git, without success so far. > > > [1] > https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh > > [2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push There is no fetch hook, however you may find that the post-{checkout,merge} hooks are suitable for what you want to do. Setting those to some custom comand is a common pattern for e.g. compiling some assets on "git pull", so you could similarly check the commits from HEAD, of course those are post-* hooks, so they won't stop the checkout. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-07 22:51 ` Fetch-hooks Ævar Arnfjörð Bjarmason @ 2018-02-08 0:06 ` Leo Gaspard 2018-02-08 15:30 ` Fetch-hooks Joey Hess 0 siblings, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-08 0:06 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git, joey On 02/07/2018 11:51 PM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Feb 07 2018, Leo Gaspard jotted: > >> Hello, >> >> tl;dr: Is there currently a way to have fetch hooks, and if not do you >> think it could be a nice feature? >> >> I was in the process of implementing hooks for git that ensure the >> repository is always cleanly signed by someone allowed to by the >> repository itself. I think I've completed the signature-checking part >> [1] and the push hook [2] (even though it isn't really configurable at >> the moment). >> >> However, I was starting to think about handling the fetch step, and >> couldn't find any fetch hook. Is there one? >> >> If not, would you think it is would be a good idea to add one, that >> would eg. be passed the commit-before, commit-after and could block the >> changing of the reference if it failed? >> >> The only other solution I could think of is using a separate script for >> fetching, but that would be fragile, as the user could always not think >> about it well and run a git fetch, breaking the objective that after the >> first clone all commits were correctly signature-checked. >> >> Thanks for reading me! >> Leo >> >> PS1: I am not subscribed to the ML. >> >> PS2: I've tried asking freenode#git, without success so far. >> >> >> [1] >> https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh >> >> [2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push > > There is no fetch hook, however you may find that the > post-{checkout,merge} hooks are suitable for what you want to do. > > Setting those to some custom comand is a common pattern for > e.g. compiling some assets on "git pull", so you could similarly check > the commits from HEAD, of course those are post-* hooks, so they won't > stop the checkout. Hmm, I don't think these would fit the bill. For post-merge, simply because I spend my life rebasing stuff around, and very rarely merge. For post-checkout, it could work, but then I'd need to keep track manually of up to where the commits have been checked and to search the git graph for the latest checked ancestor (as otherwise checking-out another branch then checking-out the first branch again would likely trigger a failure, due to the keyring being dynamic), so it would likely be a dealbreaker, due to the hook becoming too complex to be trusted. (Just in case you wonder, by “the keyring being dynamic” I mean the PGP keys allowed to sign commits are stored directly inside the git repository) That said, I just came upon [1] (esp. the description [2] and the patch [3]), and wondered: it looks like the patch was abandoned midway in favor of a hook refactoring. Would you happen to know whether the hook refactoring eventually took place, and/or whether this patch was resubmitted later, and/or whether it would still be possible to merge this now? (not having any experience with git's internals yet, I don't really know whether these are stupid questions or not) Thanks! Leo PS: Cc'ing Joey, as you most likely know best what eventually happened, if you can remember it? [1] https://marc.info/?t=132477041500001&r=1&w=2 [2] https://marc.info/?l=git&m=132483581218382&w=2 [3] https://marc.info/?l=git&m=132486687023893&w=2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-08 0:06 ` Fetch-hooks Leo Gaspard @ 2018-02-08 15:30 ` Joey Hess 2018-02-08 17:02 ` Fetch-hooks Leo Gaspard 0 siblings, 1 reply; 38+ messages in thread From: Joey Hess @ 2018-02-08 15:30 UTC (permalink / raw) To: Leo Gaspard; +Cc: Ævar Arnfjörð Bjarmason, git [-- Attachment #1: Type: text/plain, Size: 1363 bytes --] Leo Gaspard wrote: > That said, I just came upon [1] (esp. the description [2] and the patch > [3]), and wondered: it looks like the patch was abandoned midway in > favor of a hook refactoring. Would you happen to know whether the hook > refactoring eventually took place, and/or whether this patch was > resubmitted later, and/or whether it would still be possible to merge > this now? (not having any experience with git's internals yet, I don't > really know whether these are stupid questions or not) > > PS: Cc'ing Joey, as you most likely know best what eventually happened, > if you can remember it? I don't remember it well, but reviewing the thread, I think it foundered on this comment by Junio: > That use case sounds like that "git fetch" is called as a first class UI, > which is covered by "git myfetch" (you can call it "git annex fetch") > wrapper approach, the canonical example of a hook that we explicitly do ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > not want to add. ^^^^^^^^^^^^^^^ While I still think a fetch hook would be a good idea for reasons of composability, I then just went off and implemented such a wrapper for my own particular use case, and the wrapper program then grew to cover use cases that a hook would not have been able to cover, so ... -- see shy jo [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-08 15:30 ` Fetch-hooks Joey Hess @ 2018-02-08 17:02 ` Leo Gaspard 2018-02-08 21:06 ` Fetch-hooks Ævar Arnfjörð Bjarmason 2018-02-09 19:12 ` Fetch-hooks Leo Gaspard 0 siblings, 2 replies; 38+ messages in thread From: Leo Gaspard @ 2018-02-08 17:02 UTC (permalink / raw) To: Joey Hess; +Cc: Ævar Arnfjörð Bjarmason, git On 02/08/2018 04:30 PM, Joey Hess wrote: > Leo Gaspard wrote: >> That said, I just came upon [1] (esp. the description [2] and the patch >> [3]), and wondered: it looks like the patch was abandoned midway in >> favor of a hook refactoring. Would you happen to know whether the hook >> refactoring eventually took place, and/or whether this patch was >> resubmitted later, and/or whether it would still be possible to merge >> this now? (not having any experience with git's internals yet, I don't >> really know whether these are stupid questions or not) >> >> PS: Cc'ing Joey, as you most likely know best what eventually happened, >> if you can remember it? > > I don't remember it well, but reviewing the thread, I think it foundered > on this comment by Junio: > >> That use case sounds like that "git fetch" is called as a first class UI, >> which is covered by "git myfetch" (you can call it "git annex fetch") >> wrapper approach, the canonical example of a hook that we explicitly do > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> not want to add. > ^^^^^^^^^^^^^^^ > > While I still think a fetch hook would be a good idea for reasons of > composability, I then just went off and implemented such a wrapper for > my own particular use case, and the wrapper program then grew to cover > use cases that a hook would not have been able to cover, so ... Hmm, OK, so I guess I'll try to update the patch when I get some time to delve into git's internals, as my use case (forbidding some fetches) couldn't afaik be covered by a wrapper hook. Thanks for the feedback! Leo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-08 17:02 ` Fetch-hooks Leo Gaspard @ 2018-02-08 21:06 ` Ævar Arnfjörð Bjarmason 2018-02-08 22:18 ` Fetch-hooks Leo Gaspard 2018-02-09 19:12 ` Fetch-hooks Leo Gaspard 1 sibling, 1 reply; 38+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-02-08 21:06 UTC (permalink / raw) To: Leo Gaspard; +Cc: Joey Hess, git On Thu, Feb 08 2018, Leo Gaspard jotted: > On 02/08/2018 04:30 PM, Joey Hess wrote: >> Leo Gaspard wrote: >>> That said, I just came upon [1] (esp. the description [2] and the patch >>> [3]), and wondered: it looks like the patch was abandoned midway in >>> favor of a hook refactoring. Would you happen to know whether the hook >>> refactoring eventually took place, and/or whether this patch was >>> resubmitted later, and/or whether it would still be possible to merge >>> this now? (not having any experience with git's internals yet, I don't >>> really know whether these are stupid questions or not) >>> >>> PS: Cc'ing Joey, as you most likely know best what eventually happened, >>> if you can remember it? >> >> I don't remember it well, but reviewing the thread, I think it foundered >> on this comment by Junio: >> >>> That use case sounds like that "git fetch" is called as a first class UI, >>> which is covered by "git myfetch" (you can call it "git annex fetch") >>> wrapper approach, the canonical example of a hook that we explicitly do >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> not want to add. >> ^^^^^^^^^^^^^^^ >> >> While I still think a fetch hook would be a good idea for reasons of >> composability, I then just went off and implemented such a wrapper for >> my own particular use case, and the wrapper program then grew to cover >> use cases that a hook would not have been able to cover, so ... > > Hmm, OK, so I guess I'll try to update the patch when I get some time to > delve into git's internals, as my use case (forbidding some fetches) > couldn't afaik be covered by a wrapper hook. Per my reading of https://public-inbox.org/git/20111224234212.GA21533@gnu.kitenet.net/ what Joey implemented is not what you described in your initial mail. His is a *post*-fetch hook, we've already done the fetch and are just telling you as a courtesy what refs changed. You could also implement this as some cronjob that polls git for-each-ref but it's easier as a hook, fine. What you're describing is something like a pre-fetch hook analogous to the pre-receive hooks, where you're fed refs updated on the remote on stdin, and can say you don't want some of those to be updated. This may just be a lack of imagination on my part, but I don't see how that's sensible at all. The refs we fetch are our *copy* of the remote refs, why would you not want to track the upstream remote. You're going to refuse some branches and what? Be further behind until some point in the future where the tip is GPG-signed and you accept it, at which poich you'll need to do more work than if you were up-to-date with the almost-GPG-signed version? I think you're confusing two things here. One is the reasonable concern of wanting to not have your local copy of remote refs have undesirable content, but a pre-fetch hook is not the way to accomplish that. The other is e.g. to ensure that you never locally check out some "bad" ref, we don't have hook support for that, but could add it, e.g. git-checkout and git reset --hard could be taught about some pre-checkout hook. You could also have some intermediate step between these two, where e.g. your refspec for "origin" is "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that location, then you move them (with some alias/hook) to "refs/remotes/origin/*" once they're seen to be "OK". ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-08 21:06 ` Fetch-hooks Ævar Arnfjörð Bjarmason @ 2018-02-08 22:18 ` Leo Gaspard 2018-02-09 22:04 ` Fetch-hooks Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-08 22:18 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Joey Hess, git On 02/08/2018 10:06 PM, Ævar Arnfjörð Bjarmason wrote:>> Hmm, OK, so I guess I'll try to update the patch when I get some time to >> delve into git's internals, as my use case (forbidding some fetches) >> couldn't afaik be covered by a wrapper hook. > > Per my reading of > https://public-inbox.org/git/20111224234212.GA21533@gnu.kitenet.net/ > what Joey implemented is not what you described in your initial mail. > > His is a *post*-fetch hook, we've already done the fetch and are just > telling you as a courtesy what refs changed. You could also implement > this as some cronjob that polls git for-each-ref but it's easier as a > hook, fine. I was thinking along the lines of https://marc.info/?l=git&m=132486687023893&w=2 with high-level description at https://marc.info/?l=git&m=132480559712592&w=2 With the high-level description given here, I'm pretty sure I can hack a hook together to make things work as I want them to. > What you're describing is something like a pre-fetch hook analogous to > the pre-receive hooks, where you're fed refs updated on the remote on > stdin, and can say you don't want some of those to be updated. > > This may just be a lack of imagination on my part, but I don't see how > that's sensible at all. > > The refs we fetch are our *copy* of the remote refs, why would you not > want to track the upstream remote. You're going to refuse some branches > and what? Be further behind until some point in the future where the tip > is GPG-signed and you accept it, at which poich you'll need to do more > work than if you were up-to-date with the almost-GPG-signed version? That's about it. I want all fetching to be blocked in case of the tip not being signed. As there is a pre-push hook ensuring committers don't forget to sign before pushing, the only case the tip could not be signed is in case of an attack, which means it's better to just force-push master because any git repo that fetched it is doomed anyway. Definitely would not want to allow an untrusted revision get into anything that could even remotely be taken as “endorsed” by the user. (BTW, in order to avoid the case of someone forgetting to sign the commit and not having installed the pre-push hook, there can be holes in the commit-signing chain, the drawback being that the committer pushing a signed commit takes responsibility for all unsigned commits directly preceding his -- allowing them to recover in case of a mistaken push) > I think you're confusing two things here. One is the reasonable concern > of wanting to not have your local copy of remote refs have undesirable > content, but a pre-fetch hook is not the way to accomplish that. Well, a pre-fetch hook is a possible way of accomplishing that, and I don't know of any better one? > The other is e.g. to ensure that you never locally check out some "bad" > ref, we don't have hook support for that, but could add it, > e.g. git-checkout and git reset --hard could be taught about some > pre-checkout hook. Issue is, once we have to fix checkout and reset, all other commands that potentially touch the worktree also have to be fixed (eg. I don't know whether worktree add triggers pre-checkout?) Also, this requires the hook to store a database of all the paths that have been checked, because there is no logic in how one may choose to checkout the repo. While having a tweak-fetch hook would make the implementation straightforward, because at the time of invoking the hook the “refname at remote” commit is already trusted, and the “object name” is the commit whose validity we want to check, so we just have to check the path between those two. (I don't know if you checked my current scripts, but basically as the set of allowed PGP keys can change at any commit, it's only possible to check a commit path, not a single commit out-of-nowhere) The only issue that could arise with a tweak-fetch hook is in case of a force-fetch (and even maybe it's not even an actual issue, I haven't given it real thought yet), but this can reasonably be banned, as once a commit is signed it enters the “real” master branch, that should never be moved backward, as it can't be the sign of an attack. > You could also have some intermediate step between these two, where > e.g. your refspec for "origin" is > "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default > "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that > location, then you move them (with some alias/hook) to > "refs/remotes/origin/*" once they're seen to be "OK". That is indeed another possibility, but then the idea is to make things as transparent as possible for the end-user, not to completely change their git workflow. As such, considering only signed commits to be part of the upstream seems to make sense to me? Cheers, Leo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-08 22:18 ` Fetch-hooks Leo Gaspard @ 2018-02-09 22:04 ` Ævar Arnfjörð Bjarmason 2018-02-09 22:24 ` Fetch-hooks Leo Gaspard 2018-02-09 22:30 ` Fetch-hooks Jeff King 0 siblings, 2 replies; 38+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-02-09 22:04 UTC (permalink / raw) To: Leo Gaspard; +Cc: Joey Hess, git, Jeff King, Brandon Williams On Thu, Feb 08 2018, Leo Gaspard jotted: > On 02/08/2018 10:06 PM, Ævar Arnfjörð Bjarmason wrote:>> Hmm, OK, so I > guess I'll try to update the patch when I get some time to >>> delve into git's internals, as my use case (forbidding some fetches) >>> couldn't afaik be covered by a wrapper hook. >> >> Per my reading of >> https://public-inbox.org/git/20111224234212.GA21533@gnu.kitenet.net/ >> what Joey implemented is not what you described in your initial mail. >> >> His is a *post*-fetch hook, we've already done the fetch and are just >> telling you as a courtesy what refs changed. You could also implement >> this as some cronjob that polls git for-each-ref but it's easier as a >> hook, fine. > > I was thinking along the lines of > https://marc.info/?l=git&m=132486687023893&w=2 > with high-level description at > https://marc.info/?l=git&m=132480559712592&w=2 > > With the high-level description given here, I'm pretty sure I can hack a > hook together to make things work as I want them to. > >> What you're describing is something like a pre-fetch hook analogous to >> the pre-receive hooks, where you're fed refs updated on the remote on >> stdin, and can say you don't want some of those to be updated. >> >> This may just be a lack of imagination on my part, but I don't see how >> that's sensible at all. >> >> The refs we fetch are our *copy* of the remote refs, why would you not >> want to track the upstream remote. You're going to refuse some branches >> and what? Be further behind until some point in the future where the tip >> is GPG-signed and you accept it, at which poich you'll need to do more >> work than if you were up-to-date with the almost-GPG-signed version? > > That's about it. I want all fetching to be blocked in case of the tip > not being signed. As there is a pre-push hook ensuring committers don't > forget to sign before pushing, the only case the tip could not be signed > is in case of an attack, which means it's better to just force-push > master because any git repo that fetched it is doomed anyway. Definitely > would not want to allow an untrusted revision get into anything that > could even remotely be taken as “endorsed” by the user. > > (BTW, in order to avoid the case of someone forgetting to sign the > commit and not having installed the pre-push hook, there can be holes in > the commit-signing chain, the drawback being that the committer pushing > a signed commit takes responsibility for all unsigned commits directly > preceding his -- allowing them to recover in case of a mistaken push) > >> I think you're confusing two things here. One is the reasonable concern >> of wanting to not have your local copy of remote refs have undesirable >> content, but a pre-fetch hook is not the way to accomplish that. > > Well, a pre-fetch hook is a possible way of accomplishing that, and I > don't know of any better one? > >> The other is e.g. to ensure that you never locally check out some "bad" >> ref, we don't have hook support for that, but could add it, >> e.g. git-checkout and git reset --hard could be taught about some >> pre-checkout hook. > > Issue is, once we have to fix checkout and reset, all other commands > that potentially touch the worktree also have to be fixed (eg. I don't > know whether worktree add triggers pre-checkout?) > > Also, this requires the hook to store a database of all the paths that > have been checked, because there is no logic in how one may choose to > checkout the repo. While having a tweak-fetch hook would make the > implementation straightforward, because at the time of invoking the hook > the “refname at remote” commit is already trusted, and the “object name” > is the commit whose validity we want to check, so we just have to check > the path between those two. (I don't know if you checked my current > scripts, but basically as the set of allowed PGP keys can change at any > commit, it's only possible to check a commit path, not a single commit > out-of-nowhere) Right, it's certainly the case that refusing the refs is a more effective gatekeeper, we're not going to have hooks for all the low-level stuff that might inspect sha1s or check them out. My assumption was that hooking into just a couple of things would be good enough, but yes, there's other trade-offs. > The only issue that could arise with a tweak-fetch hook is in case of a > force-fetch (and even maybe it's not even an actual issue, I haven't > given it real thought yet), but this can reasonably be banned, as once a > commit is signed it enters the “real” master branch, that should never > be moved backward, as it can't be the sign of an attack. > >> You could also have some intermediate step between these two, where >> e.g. your refspec for "origin" is >> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default >> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that >> location, then you move them (with some alias/hook) to >> "refs/remotes/origin/*" once they're seen to be "OK". > > That is indeed another possibility, but then the idea is to make things > as transparent as possible for the end-user, not to completely change > their git workflow. As such, considering only signed commits to be part > of the upstream seems to make sense to me? I mean this would be something that would be part of a post-fetch hook, so it would be as transparent as what you're doing to the user, with the difference that it doesn't need to involve changes to what you slurp down from the server. I.e. we'd just fetch into refs/remotes/origin-untrusted/, then we run your post-fetch hook and you go over the new refs, and copy what you like (usually everything) to refs/remotes/origin/*. You get the same thing, the logic just becomes inspecting something locally and copying it over. There are other caveats, notably that the local ref store now needs to store 2x the amount of branches, unless we're smarter about it and only store stuff in refs/remotes/origin-untrusted/ that's different from refs/remotes/origin/, as a sort of ref staging area. Anyway, I don't have strong opinions on this, just thought it was worth discussing the whys. One thing that's not discussed yet, and I know just enough about for it to tingle my spidey sense, but not enough to say for sure (CC'd Jeff & Brandon who know more) is that this feature once shipped might cause higher load on git hosting providers. This is because people will inevitably use it in popular projects for some custom filtering, and because you're continually re-fetching and inspecting stuff what used to be a really cheap no-op "pull" most of the time is a more expensive negotiation every time before the client rejects the refs again, and worse for hosting providers because you have bespoke ref fetching strategies you have less odds of being able to cache both the negotiation and the pack you serve. I.e. you want this for some security feature where 99.99% of the time you accept all refs, but most people will probably use this to implement dynamic Turing-complete refspecs. Maybe that's worrying about nothing, but worth thinking about. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-09 22:04 ` Fetch-hooks Ævar Arnfjörð Bjarmason @ 2018-02-09 22:24 ` Leo Gaspard 2018-02-09 22:56 ` Fetch-hooks Ævar Arnfjörð Bjarmason 2018-02-09 22:30 ` Fetch-hooks Jeff King 1 sibling, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-09 22:24 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Joey Hess, git, Jeff King, Brandon Williams On 02/09/2018 11:04 PM, Ævar Arnfjörð Bjarmason wrote:>>> You could also have some intermediate step between these two, where >>> e.g. your refspec for "origin" is >>> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default >>> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that >>> location, then you move them (with some alias/hook) to >>> "refs/remotes/origin/*" once they're seen to be "OK". >> >> That is indeed another possibility, but then the idea is to make things >> as transparent as possible for the end-user, not to completely change >> their git workflow. As such, considering only signed commits to be part >> of the upstream seems to make sense to me? > > I mean this would be something that would be part of a post-fetch hook, > so it would be as transparent as what you're doing to the user, with the > difference that it doesn't need to involve changes to what you slurp > down from the server. > > I.e. we'd just fetch into refs/remotes/origin-untrusted/, then we run > your post-fetch hook and you go over the new refs, and copy what you > like (usually everything) to refs/remotes/origin/*. Hmm... but that would then require a post-fetch hook, wouldn't it? And about a post-fetch hook, if I understood correctly, Junio in [1] had a quite nice argument against it: Although I do not deeply care between such a "trigger to only notify, no touching" hook and a full-blown "allow hook writers to easily lie about what happened in the fetch" hook, I was hoping that we would get this right and useful if we were spending our brain bandwidth on it. I am not very fond of an easier "trigger to only notify" hook because people are bound to misuse the interface and try updating the refs anyway, making it easy to introduce inconsistencies between refs and FETCH_HEAD that will confuse the later "merge" step. Otherwise, if it doesn't require a post-fetch hook, then it would require the end-user to first fetch, then run the `copy-trusted-refs-over` script, which would add stuff to the user's workflow. Did I miss another possibility? > [...] > > One thing that's not discussed yet, and I know just enough about for it > to tingle my spidey sense, but not enough to say for sure (CC'd Jeff & > Brandon who know more) is that this feature once shipped might cause > higher load on git hosting providers. > > This is because people will inevitably use it in popular projects for > some custom filtering, and because you're continually re-fetching and > inspecting stuff what used to be a really cheap no-op "pull" most of the > time is a more expensive negotiation every time before the client > rejects the refs again, and worse for hosting providers because you have > bespoke ref fetching strategies you have less odds of being able to > cache both the negotiation and the pack you serve. > > I.e. you want this for some security feature where 99.99% of the time > you accept all refs, but most people will probably use this to implement > dynamic Turing-complete refspecs. > > Maybe that's worrying about nothing, but worth thinking about. Well... First, I must say I didn't really understand your last paragraph about Turing-complete refspecs. But my understanding of how the fetch-hook patchset I sent this evening works is that it first receives all the objects from the hosting provider, then locally moves the refs, but never actually discards the downloaded objects (well, until a `git gc` I guess). So I don't think the network traffic with the provider would be any different wrt. what it is now, even if a tweak-fetch hook rejects some commits? Then again I don't know git's internals enough to be have even a bit of certainty about what I'm saying right now, so... [1] https://marc.info/?l=git&m=132480559712592&w=2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-09 22:24 ` Fetch-hooks Leo Gaspard @ 2018-02-09 22:56 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 38+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-02-09 22:56 UTC (permalink / raw) To: Leo Gaspard; +Cc: Joey Hess, git, Jeff King, Brandon Williams On Fri, Feb 09 2018, Leo Gaspard jotted: > On 02/09/2018 11:04 PM, Ævar Arnfjörð Bjarmason wrote:>>> You could also > have some intermediate step between these two, where >>>> e.g. your refspec for "origin" is >>>> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default >>>> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that >>>> location, then you move them (with some alias/hook) to >>>> "refs/remotes/origin/*" once they're seen to be "OK". >>> >>> That is indeed another possibility, but then the idea is to make things >>> as transparent as possible for the end-user, not to completely change >>> their git workflow. As such, considering only signed commits to be part >>> of the upstream seems to make sense to me? >> >> I mean this would be something that would be part of a post-fetch hook, >> so it would be as transparent as what you're doing to the user, with the >> difference that it doesn't need to involve changes to what you slurp >> down from the server. >> >> I.e. we'd just fetch into refs/remotes/origin-untrusted/, then we run >> your post-fetch hook and you go over the new refs, and copy what you >> like (usually everything) to refs/remotes/origin/*. > > Hmm... but that would then require a post-fetch hook, wouldn't it? And > about a post-fetch hook, if I understood correctly, Junio in [1] had a > quite nice argument against it: > > Although I do not deeply care between such a "trigger to only > notify, no touching" hook and a full-blown "allow hook writers to > easily lie about what happened in the fetch" hook, I was hoping that > we would get this right and useful if we were spending our brain > bandwidth on it. I am not very fond of an easier "trigger to only > notify" hook because people are bound to misuse the interface and > try updating the refs anyway, making it easy to introduce > inconsistencies between refs and FETCH_HEAD that will confuse the > later "merge" step. > > Otherwise, if it doesn't require a post-fetch hook, then it would > require the end-user to first fetch, then run the > `copy-trusted-refs-over` script, which would add stuff to the user's > workflow. > > Did I miss another possibility? Yes, the assumption is that there would be a post-fetch hook of some sort to ferry the refs over from the quarantine. My reading of that thread is not that Junio's outright against such a facility (and also it was 2011 and the discussion can be re-visited), but rather that there's specific concerns that need to be kept in mind so reliable things involving the ref store don't become brittle as a result of unstable user hooks, or us offering an interface that's easily misused. >> [...] >> >> One thing that's not discussed yet, and I know just enough about for it >> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff & >> Brandon who know more) is that this feature once shipped might cause >> higher load on git hosting providers. >> >> This is because people will inevitably use it in popular projects for >> some custom filtering, and because you're continually re-fetching and >> inspecting stuff what used to be a really cheap no-op "pull" most of the >> time is a more expensive negotiation every time before the client >> rejects the refs again, and worse for hosting providers because you have >> bespoke ref fetching strategies you have less odds of being able to >> cache both the negotiation and the pack you serve. >> >> I.e. you want this for some security feature where 99.99% of the time >> you accept all refs, but most people will probably use this to implement >> dynamic Turing-complete refspecs. >> >> Maybe that's worrying about nothing, but worth thinking about. > > Well... First, I must say I didn't really understand your last paragraph > about Turing-complete refspecs. > > But my understanding of how the fetch-hook patchset I sent this evening > works is that it first receives all the objects from the hosting > provider, then locally moves the refs, but never actually discards the > downloaded objects (well, until a `git gc` I guess). > > So I don't think the network traffic with the provider would be any > different wrt. what it is now, even if a tweak-fetch hook rejects some > commits? Then again I don't know git's internals enough to be have even > a bit of certainty about what I'm saying right now, so... > > > [1] https://marc.info/?l=git&m=132480559712592&w=2 As Jeff notes in the side-thread in <20180209223011.GA24578@sigill.intra.peff.net> when you do a "fetch" the protocol is not negotiating on the basis of what loose unreferenced (because you threw away the ref!) objects you have already, but what the ref commonality is between you and the server (and this is just on a best-effort basis). So for example, let's say you only accept the "master" branch and have a hook that's refusing updates to the "dev" branch, fetch is going to be re-negotiating fetching the difference between the two over the wire every time, only to have the hook say "no thanks" and throw away the result. I.e. as an extreme example, if the dev branch is adding a 1GB file divergent from master, you are going to be re-downloading that 1GB every time you fetch, even though you have that blob locally, the remote can't know that, and because you have no ref pointing to it you can't mention it during the negotiation. Which is way less efficient than the case where your refspec only fetches "master", since then we won't try at all, or the case where you fetch both master & dev, but into some quarantine area and are just deciding to update a local ref or not. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-09 22:04 ` Fetch-hooks Ævar Arnfjörð Bjarmason 2018-02-09 22:24 ` Fetch-hooks Leo Gaspard @ 2018-02-09 22:30 ` Jeff King 2018-02-09 22:45 ` Fetch-hooks Junio C Hamano 2018-02-09 23:49 ` Fetch-hooks Leo Gaspard 1 sibling, 2 replies; 38+ messages in thread From: Jeff King @ 2018-02-09 22:30 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Leo Gaspard, Joey Hess, git, Brandon Williams On Fri, Feb 09, 2018 at 11:04:17PM +0100, Ævar Arnfjörð Bjarmason wrote: > One thing that's not discussed yet, and I know just enough about for it > to tingle my spidey sense, but not enough to say for sure (CC'd Jeff & > Brandon who know more) is that this feature once shipped might cause > higher load on git hosting providers. > > This is because people will inevitably use it in popular projects for > some custom filtering, and because you're continually re-fetching and > inspecting stuff what used to be a really cheap no-op "pull" most of the > time is a more expensive negotiation every time before the client > rejects the refs again, and worse for hosting providers because you have > bespoke ref fetching strategies you have less odds of being able to > cache both the negotiation and the pack you serve. Most of the discussion so far seems to be about "accept this ref or don't accept this ref", which seems OK. But if you are going to do custom tweaking like rewriting objects, or making it common to refuse some refs, then I think things get pretty inefficient for _everybody_. The negotiation for future fetches uses the existing refs as the starting point. And if we don't know that we have the objects because there are no refs pointing at them, they're going to get transferred again. That's extra load no the server, and extra time for the user waiting on the network. I tend to agree with the direction of thinking you outlined: you're generally better off completing the fetch to a local namespace that tracks the other side completely, and then manipulating the local refs as you see fit (e.g., fetching into refs/quarantine, and then migrating "good" refs over to refs/remotes/origin). -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-09 22:30 ` Fetch-hooks Jeff King @ 2018-02-09 22:45 ` Junio C Hamano 2018-02-09 23:49 ` Fetch-hooks Leo Gaspard 1 sibling, 0 replies; 38+ messages in thread From: Junio C Hamano @ 2018-02-09 22:45 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Leo Gaspard, Joey Hess, git, Brandon Williams Jeff King <peff@peff.net> writes: > The negotiation for future fetches uses the existing refs as the > starting point. And if we don't know that we have the objects because > there are no refs pointing at them, they're going to get transferred > again. That's extra load no the server, and extra time for the user > waiting on the network. > > I tend to agree with the direction of thinking you outlined: you're > generally better off completing the fetch to a local namespace that > tracks the other side completely, and then manipulating the local refs > as you see fit (e.g., fetching into refs/quarantine, and then migrating > "good" refs over to refs/remotes/origin). Thanks for a dose of sanity ;-) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-09 22:30 ` Fetch-hooks Jeff King 2018-02-09 22:45 ` Fetch-hooks Junio C Hamano @ 2018-02-09 23:49 ` Leo Gaspard 2018-02-10 0:13 ` Fetch-hooks Jeff King 1 sibling, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-09 23:49 UTC (permalink / raw) To: Jeff King, Ævar Arnfjörð Bjarmason Cc: Joey Hess, git, Brandon Williams, Junio C Hamano On 02/09/2018 11:30 PM, Jeff King wrote: > On Fri, Feb 09, 2018 at 11:04:17PM +0100, Ævar Arnfjörð Bjarmason wrote: >> One thing that's not discussed yet, and I know just enough about for it >> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff & >> Brandon who know more) is that this feature once shipped might cause >> higher load on git hosting providers. >> >> This is because people will inevitably use it in popular projects for >> some custom filtering, and because you're continually re-fetching and >> inspecting stuff what used to be a really cheap no-op "pull" most of the >> time is a more expensive negotiation every time before the client >> rejects the refs again, and worse for hosting providers because you have >> bespoke ref fetching strategies you have less odds of being able to >> cache both the negotiation and the pack you serve. > > Most of the discussion so far seems to be about "accept this ref or > don't accept this ref", which seems OK. But if you are going to do > custom tweaking like rewriting objects, or making it common to refuse > some refs, then I think things get pretty inefficient for _everybody_. > > The negotiation for future fetches uses the existing refs as the > starting point. And if we don't know that we have the objects because > there are no refs pointing at them, they're going to get transferred > again. That's extra load no the server, and extra time for the user > waiting on the network. Oh. I thought the protocol git used was something like client: I want to fetch refs A and B server: so you'll need objects 12345678 and 90ABCDEF, A and B both point to 12345678 client: please give me object 12345678 server: here it is [...] I was clearly wrong, thanks! (and thanks Ævar for your explanation in the side-thread, too!) > I tend to agree with the direction of thinking you outlined: you're > generally better off completing the fetch to a local namespace that > tracks the other side completely, and then manipulating the local refs > as you see fit (e.g., fetching into refs/quarantine, and then migrating > "good" refs over to refs/remotes/origin). Hmm... so do I understand it correctly when I say the process you're thinking about works like this? * User installs hook for my-remote by running [something] * User runs git fetch * git fetch fetches remote refs/heads/* to local refs/quarantine/* (so I guess [something] changes the remote.my-remote.fetch refmap) * When this is done `git fetch` runs a notification-only post-fetch hook (that would need to be added) * The post-fetch hook then performs whatever it wants and updates the references in refs/remotes/my-remote/* So the changes that are required are: * Adding a notification-only post-fetch hook * For handling tags, there is a need to have a refmap for tags. Maybe adding a remote.my-remote.fetchTags refmap, that would be used when running with --tags, and having it default to “refs/tags/*:refs/tags/*” to keep the current behavior by default? The only remaining issue I can think of is: How do we avoid the issue of the trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification raised in the side-thread that Junio wrote in [1]? Maybe just writing in the documentation that the hook should use a quarantine-like approach if it wants modification would be enough to not have hook authors try to modify the ref in the post-fetch hook? Thanks for all your thoughts, and hope we're getting somewhere! Leo PS: I'll read over the reviews once I'm all clear as to what exactly is wanted for this patch, as most likely they'll just be dumped, given the current state of affairs. [1] https://marc.info/?l=git&m=132480559712592&w=2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-09 23:49 ` Fetch-hooks Leo Gaspard @ 2018-02-10 0:13 ` Jeff King 2018-02-10 0:37 ` Fetch-hooks Leo Gaspard 0 siblings, 1 reply; 38+ messages in thread From: Jeff King @ 2018-02-10 0:13 UTC (permalink / raw) To: Leo Gaspard Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote: > > I tend to agree with the direction of thinking you outlined: you're > > generally better off completing the fetch to a local namespace that > > tracks the other side completely, and then manipulating the local refs > > as you see fit (e.g., fetching into refs/quarantine, and then migrating > > "good" refs over to refs/remotes/origin). > > Hmm... so do I understand it correctly when I say the process you're > thinking about works like this? > * User installs hook for my-remote by running [something] > * User runs git fetch > * git fetch fetches remote refs/heads/* to local refs/quarantine/* (so > I guess [something] changes the remote.my-remote.fetch refmap) > * When this is done `git fetch` runs a notification-only post-fetch > hook (that would need to be added) > * The post-fetch hook then performs whatever it wants and updates the > references in refs/remotes/my-remote/* Yeah, that was roughly my thinking. > So the changes that are required are: > * Adding a notification-only post-fetch hook > * For handling tags, there is a need to have a refmap for tags. Maybe > adding a remote.my-remote.fetchTags refmap, that would be used when > running with --tags, and having it default to “refs/tags/*:refs/tags/*” > to keep the current behavior by default? Yeah, tag-following may be a little tricky, because it usually wants to write to refs/tags/. One workaround would be to have your config look like this: [remote "origin"] fetch = +refs/heads/*:refs/quarantine/origin/heads/* fetch = +refs/tags/*:refs/quarantine/origin/tags/* tagOpt = --no-tags That's not exactly the same thing, because it would fetch all tags, not just those that point to the history on the branches. But in most repositories and workflows the distinction doesn't matter. (By the way, the I specifically chose the name "refs/quarantine" instead of anything in "refs/remotes" because we'd want to make sure that the "git checkout" DWIM behavior cannot accidentally pull from quarantine). > The only remaining issue I can think of is: How do we avoid the issue > of the > trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification > raised in the side-thread that Junio wrote in [1]? Maybe just writing > in the documentation that the hook should use a quarantine-like > approach if it wants modification would be enough to not have hook > authors try to modify the ref in the post-fetch hook? I don't have a silver bullet there. Documenting the "right" way at least seems like a good first step. -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 0:13 ` Fetch-hooks Jeff King @ 2018-02-10 0:37 ` Leo Gaspard 2018-02-10 1:08 ` Fetch-hooks Junio C Hamano 2018-02-10 12:21 ` Fetch-hooks Jeff King 0 siblings, 2 replies; 38+ messages in thread From: Leo Gaspard @ 2018-02-10 0:37 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On 02/10/2018 01:13 AM, Jeff King wrote: > On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote: >> So the changes that are required are: >> * Adding a notification-only post-fetch hook >> * For handling tags, there is a need to have a refmap for tags. Maybe >> adding a remote.my-remote.fetchTags refmap, that would be used when >> running with --tags, and having it default to “refs/tags/*:refs/tags/*” >> to keep the current behavior by default? > > Yeah, tag-following may be a little tricky, because it usually wants to > write to refs/tags/. One workaround would be to have your config look > like this: > > [remote "origin"] > fetch = +refs/heads/*:refs/quarantine/origin/heads/* > fetch = +refs/tags/*:refs/quarantine/origin/tags/* > tagOpt = --no-tags > > That's not exactly the same thing, because it would fetch all tags, not > just those that point to the history on the branches. But in most > repositories and workflows the distinction doesn't matter. Hmm... apart from the implementation complexity (of which I have no idea), is there an argument against the idea of adding a remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used every time a tag is fetched? (well, maybe not exactly similar to remote.<name>.fetch because we know the source is going to be refs/tags/*; so just having the part of .fetch past the ':' would be more like what's needed I guess) The issue with your solution is that if the user runs 'git fetch --tags', he will get the (potentially compromised) tags directly in his refs/tags/. > (By the way, the I specifically chose the name "refs/quarantine" instead > of anything in "refs/remotes" because we'd want to make sure that the > "git checkout" DWIM behavior cannot accidentally pull from quarantine). (Indeed, I understood after reading it, and would likely not have thought of it otherwise, thanks!) >> The only remaining issue I can think of is: How do we avoid the issue >> of the >> trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification >> raised in the side-thread that Junio wrote in [1]? Maybe just writing >> in the documentation that the hook should use a quarantine-like >> approach if it wants modification would be enough to not have hook >> authors try to modify the ref in the post-fetch hook? > > I don't have a silver bullet there. Documenting the "right" way at least > seems like a good first step. So long as it's not a merge-blocker it's good with me! (but then I'm likely not the one who's going to be pointed at when things go wrong in a hook, so I'm clearly biased on this matter) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 0:37 ` Fetch-hooks Leo Gaspard @ 2018-02-10 1:08 ` Junio C Hamano 2018-02-10 1:33 ` Fetch-hooks Leo Gaspard 2018-02-10 12:21 ` Fetch-hooks Jeff King 1 sibling, 1 reply; 38+ messages in thread From: Junio C Hamano @ 2018-02-10 1:08 UTC (permalink / raw) To: Leo Gaspard Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams Leo Gaspard <leo@gaspard.io> writes: > On 02/10/2018 01:13 AM, Jeff King wrote: >> On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote: >>> So the changes that are required are: >>> * Adding a notification-only post-fetch hook Maybe I missed a very early part of the discussion, but why does this even need a hook? There are some numbers [*1*] of classes of valid reasons we may want to have hooks triggered by operations, but "always do something locally after doing something else locally, regardless of the outcome of that something else" feels like the most typical anti-pattern that we do not want a hook for. If you are doing "git fetch" (or "git pull"), you already know you are doing that and you donot need a notification. You just create a workflow specific script that calls fetch or pull, followed by whatever you want to do and use that, instead of doing "git pull", and that is not any extra work than writing a hook and installing it. Unlike something like post-receive, which happens on the remote side where you may not even have an interactive access to, in response to the operation you locally do (i.e. "git push"), fetching and then doing something else in a repository you fetch into has no reason to be done as a hook. [Footnote] *1* I think the number was 5 when I last counted/enumerated, but don't quote me on that ;-) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 1:08 ` Fetch-hooks Junio C Hamano @ 2018-02-10 1:33 ` Leo Gaspard 2018-02-10 18:03 ` Fetch-hooks Leo Gaspard 0 siblings, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-10 1:33 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams On 02/10/2018 02:08 AM, Junio C Hamano wrote: > Leo Gaspard <leo@gaspard.io> writes: > >> On 02/10/2018 01:13 AM, Jeff King wrote: >>> On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote: >>>> So the changes that are required are: >>>> * Adding a notification-only post-fetch hook > > Maybe I missed a very early part of the discussion, but why does > this even need a hook? There are some numbers [*1*] of classes of > valid reasons we may want to have hooks triggered by operations, but > "always do something locally after doing something else locally, > regardless of the outcome of that something else" feels like the > most typical anti-pattern that we do not want a hook for. If you > are doing "git fetch" (or "git pull"), you already know you are > doing that and you donot need a notification. You just create a > workflow specific script that calls fetch or pull, followed by > whatever you want to do and use that, instead of doing "git pull", > and that is not any extra work than writing a hook and installing > it. > > Unlike something like post-receive, which happens on the remote side > where you may not even have an interactive access to, in response to > the operation you locally do (i.e. "git push"), fetching and then > doing something else in a repository you fetch into has no reason to > be done as a hook. I guess the very early part of the discussion you're speaking of is what I was assuming after reading https://marc.info/?l=git&m=132478296309094&w=2 Then, it's always possible to just write workflow-specific scripts that manually run git fetch then copy the refs (resp. run git fetch then copy the refs then run git merge) to get git myfetch (resp. git mypull) [1]. But then, the question is, why is there a pre-push hook? it's already possible to have a custom script that first runs the hook then runs git push, for most if not all of the use cases of the pre-push hook. Yet the possibility to not change the end-user's workflow is what makes pre-push (or pre-commit, with which the similarity is perhaps even more obvious) so useful. So the reason for a post-fetch in my opinion is the same as for a pre-push / pre-commit: not changing the user's workflow, while providing additional features. [1] Which makes me notice that actually the post-fetch hook technique we were discussing with Peff suffers the same not-updating-FETCH_HEAD issue that was discussed in this early thread: a `git pull` would try to merge from refs/quarantine, I guess. So things are a bit harder than we thought. I guess the tweak-fetch hook could be left “as-is”, but with git automatically doing the “quarantine-ing” thing transparently so that the references that the end-user (or the hook for that matter) see are the “curated” ones? Then it's too late for me to think efficiently right now, so that idea may be stupid or over-complex. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 1:33 ` Fetch-hooks Leo Gaspard @ 2018-02-10 18:03 ` Leo Gaspard 0 siblings, 0 replies; 38+ messages in thread From: Leo Gaspard @ 2018-02-10 18:03 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams On 02/10/2018 02:33 AM, Leo Gaspard wrote:> I guess the very early part of the discussion you're speaking of is what > I was assuming after reading > https://marc.info/?l=git&m=132478296309094&w=2 > > [...] > > So the reason for a post-fetch in my opinion is the same as for a > pre-push / pre-commit: not changing the user's workflow, while providing > additional features. Well, re-reading my email, it looks aggressive to me now... sorry about that! What I meant was basically, that in my mind pre-push or pre-commit are also local-only things, and they are really useful in that they allow to block the push or the commit if some conditions are not met (eg. block commit with trailing whitespace, or block push of unsigned commits). In pretty much the same way, what I'm really looking for is a way to “block the fetch” (from a user-visible standpoint) -- like pre-push but in the opposite direction. I hope such a goal is not an anti-pattern for hook design? In order to do this, I first tried updating this tweak-fetch hook patch from late 2011, and then learned that it would cause too high a load on servers. Then, while brainstorming another solution, this idea of a notification-only post-fetch hook arose. But, as I noticed while writing my previous answer, this suffers the same the-hook-writer-must-correctly-update-FETCH_HEAD-concurrently-with-remote-branch issue that you pointed out just after the “*HOWEVER*” in [1]. So that solution is likely a bad solution too. I guess we'll continue the search for a good solution in the side-thread with Peff, hoping to figure one out. That being said about what I meant in my last email, obviously you're the one who has the say on what goes in or not! And if it's an anti-feature I'd much rather know it now than after spending a few nights coding :) So, what do you think about this use case I'm thinking of? (ie. “blocking the fetch” like pre-push “blocks the push” and pre-commit “blocks the commit”?) [1] https://marc.info/?l=git&m=132478296309094&w=2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 0:37 ` Fetch-hooks Leo Gaspard 2018-02-10 1:08 ` Fetch-hooks Junio C Hamano @ 2018-02-10 12:21 ` Jeff King 2018-02-10 18:36 ` Fetch-hooks Leo Gaspard 2018-02-14 1:46 ` Fetch-hooks Jacob Keller 1 sibling, 2 replies; 38+ messages in thread From: Jeff King @ 2018-02-10 12:21 UTC (permalink / raw) To: Leo Gaspard Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote: > > Yeah, tag-following may be a little tricky, because it usually wants to > > write to refs/tags/. One workaround would be to have your config look > > like this: > > > > [remote "origin"] > > fetch = +refs/heads/*:refs/quarantine/origin/heads/* > > fetch = +refs/tags/*:refs/quarantine/origin/tags/* > > tagOpt = --no-tags > > > > That's not exactly the same thing, because it would fetch all tags, not > > just those that point to the history on the branches. But in most > > repositories and workflows the distinction doesn't matter. > > Hmm... apart from the implementation complexity (of which I have no > idea), is there an argument against the idea of adding a > remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used > every time a tag is fetched? (well, maybe not exactly similar to > remote.<name>.fetch because we know the source is going to be > refs/tags/*; so just having the part of .fetch past the ':' would be > more like what's needed I guess) I don't think it would be too hard to implement, and is at least logically consistent with the way we handle tags. If we were designing from scratch, I do think this would all be helped immensely by having more separation of refs fetched from remotes. There was a proposal in the v1.8 era to fetch everything for a remote, including tags, into "refs/remotes/origin/heads/", "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to look for tags in each of the remote-tag namespaces. I still think that would be a good direction to go, but it would be a big project (which is why the original stalled). > The issue with your solution is that if the user runs 'git fetch > --tags', he will get the (potentially compromised) tags directly in his > refs/tags/. True. -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 12:21 ` Fetch-hooks Jeff King @ 2018-02-10 18:36 ` Leo Gaspard 2018-02-12 19:23 ` Fetch-hooks Brandon Williams 2018-02-14 1:35 ` Fetch-hooks Jeff King 2018-02-14 1:46 ` Fetch-hooks Jacob Keller 1 sibling, 2 replies; 38+ messages in thread From: Leo Gaspard @ 2018-02-10 18:36 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On 02/10/2018 01:21 PM, Jeff King wrote: > On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote: > >>> Yeah, tag-following may be a little tricky, because it usually wants to >>> write to refs/tags/. One workaround would be to have your config look >>> like this: >>> >>> [remote "origin"] >>> fetch = +refs/heads/*:refs/quarantine/origin/heads/* >>> fetch = +refs/tags/*:refs/quarantine/origin/tags/* >>> tagOpt = --no-tags >>> >>> That's not exactly the same thing, because it would fetch all tags, not >>> just those that point to the history on the branches. But in most >>> repositories and workflows the distinction doesn't matter. >> >> Hmm... apart from the implementation complexity (of which I have no >> idea), is there an argument against the idea of adding a >> remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used >> every time a tag is fetched? (well, maybe not exactly similar to >> remote.<name>.fetch because we know the source is going to be >> refs/tags/*; so just having the part of .fetch past the ':' would be >> more like what's needed I guess) > > I don't think it would be too hard to implement, and is at least > logically consistent with the way we handle tags. > > If we were designing from scratch, I do think this would all be helped > immensely by having more separation of refs fetched from remotes. There > was a proposal in the v1.8 era to fetch everything for a remote, > including tags, into "refs/remotes/origin/heads/", > "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to > look for tags in each of the remote-tag namespaces. > > I still think that would be a good direction to go, but it would be a > big project (which is why the original stalled). Hmm... would this also drown the remote.<name>.fetch map? Also, I think this behavior could be emulated with fetch and fetchTagsTo, and it would look like: [remote "my-remote"] fetch = +refs/heads/*:refs/remotes/my-remote/heads/* fetchTagsTo = refs/remotes/my-remote/tags/* The remaining issue being to teach the lookup side to look for tags in all the remote-tag namespaces (and the fact it's a breaking change). That said, actually I just noticed an issue in the “add a remote.<name>.fetch option to fetch to refs/quarantine then have the post-fetch hook do the work”: it means if I run `git pull`, then: 1. The remote references will be pulled to refs/quarantine/... 2. The post-fetch hook will copy the accepted ones to refs/remotes/... 3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into local branches... and so merge from refs/quarantine. A solution would be to also update FETCH_HEAD in the post-fetch hook, but then we're back to the issue raised by Junio after the “*HOWEVER*” of [1]: the hook writer has to maintain consistency between the “copied” references and FETCH_HEAD. So, when thinking about it, I'm back to thinking the proper hook interface should be the one of the tweak-fetch hook, but its implementation should make it not go crazy on remote servers. And so that the implementation should do all this refs/quarantine wizardry inside git itself. So basically what I'm getting at at the moment is that git fetch should: 1. fetch all the references to refs/real-remotes/<name>/{insert here a copy of the refs/ tree of <name>} 2. figure out what the “expected” names for these references will by, by looking at remote.<name>.fetch (and at whether --tags was passed) 3. for each “expected” name, 1. if a tweak-fetch hook is present, call it with the refs/real-remotes/... refname and the “expected end-name” from remote.<name>.fetch 2. based on the hook's result, potentially to move the “expected end-name” to another commit than the one referenced by refs/real-remotes/... 3. move the “expected” name to the commit referenced in refs/real-remotes Which would make the tweak-fetch hook interface simpler (though more restrictive, but I don't have any real use case for the other change possibilities) than it is now: 1. feed the hook with lines of “refs/real-remotes/my-remote/heads/my-branch refs/remotes/my-remote/my-branch” (assuming remote.my-remote.fetch is “normal”) or “refs/real-remotes/my-remote/tags/my-tag refs/tags/my-tag” (if my-tag is being fetched from my-remote) 2. read lines of “<refspec> refs/remotes/my-remote/my-branch”, that will re-point my-branch to <refspec> instead of refs/real-remotes/my-remote/heads/my-branch. In order to avoid any issue, the hook is not allowed to pass as second output a reference that was not passed as second input. This way, the behavior of the tweak-fetch hook is reasonably preserved (at least for my use case), and there is no additional load on the servers thanks to the up-to-date references being stored in refs/real-remotes/<name>/<refspec> Does this reasoning make any sense? [1] https://marc.info/?l=git&m=132478296309094&w=2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 18:36 ` Fetch-hooks Leo Gaspard @ 2018-02-12 19:23 ` Brandon Williams 2018-02-13 15:44 ` Fetch-hooks Leo Gaspard 2018-02-14 1:38 ` Fetch-hooks Jeff King 2018-02-14 1:35 ` Fetch-hooks Jeff King 1 sibling, 2 replies; 38+ messages in thread From: Brandon Williams @ 2018-02-12 19:23 UTC (permalink / raw) To: Leo Gaspard Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess, git, Junio C Hamano On 02/10, Leo Gaspard wrote: > On 02/10/2018 01:21 PM, Jeff King wrote: > > On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote: > > > >>> Yeah, tag-following may be a little tricky, because it usually wants to > >>> write to refs/tags/. One workaround would be to have your config look > >>> like this: > >>> > >>> [remote "origin"] > >>> fetch = +refs/heads/*:refs/quarantine/origin/heads/* > >>> fetch = +refs/tags/*:refs/quarantine/origin/tags/* > >>> tagOpt = --no-tags > >>> > >>> That's not exactly the same thing, because it would fetch all tags, not > >>> just those that point to the history on the branches. But in most > >>> repositories and workflows the distinction doesn't matter. > >> > >> Hmm... apart from the implementation complexity (of which I have no > >> idea), is there an argument against the idea of adding a > >> remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used > >> every time a tag is fetched? (well, maybe not exactly similar to > >> remote.<name>.fetch because we know the source is going to be > >> refs/tags/*; so just having the part of .fetch past the ':' would be > >> more like what's needed I guess) > > > > I don't think it would be too hard to implement, and is at least > > logically consistent with the way we handle tags. > > > > If we were designing from scratch, I do think this would all be helped > > immensely by having more separation of refs fetched from remotes. There > > was a proposal in the v1.8 era to fetch everything for a remote, > > including tags, into "refs/remotes/origin/heads/", > > "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to > > look for tags in each of the remote-tag namespaces. > > > > I still think that would be a good direction to go, but it would be a > > big project (which is why the original stalled). > > Hmm... would this also drown the remote.<name>.fetch map? Also, I think > this behavior could be emulated with fetch and fetchTagsTo, and it would > look like: > [remote "my-remote"] > fetch = +refs/heads/*:refs/remotes/my-remote/heads/* > fetchTagsTo = refs/remotes/my-remote/tags/* > The remaining issue being to teach the lookup side to look for tags in > all the remote-tag namespaces (and the fact it's a breaking change). > > That said, actually I just noticed an issue in the “add a > remote.<name>.fetch option to fetch to refs/quarantine then have the > post-fetch hook do the work”: it means if I run `git pull`, then: > 1. The remote references will be pulled to refs/quarantine/... > 2. The post-fetch hook will copy the accepted ones to refs/remotes/... > 3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into > local branches... and so merge from refs/quarantine. > > A solution would be to also update FETCH_HEAD in the post-fetch hook, > but then we're back to the issue raised by Junio after the “*HOWEVER*” > of [1]: the hook writer has to maintain consistency between the “copied” > references and FETCH_HEAD. > > So, when thinking about it, I'm back to thinking the proper hook > interface should be the one of the tweak-fetch hook, but its > implementation should make it not go crazy on remote servers. And so > that the implementation should do all this refs/quarantine wizardry > inside git itself. > > So basically what I'm getting at at the moment is that git fetch should: > 1. fetch all the references to refs/real-remotes/<name>/{insert here a > copy of the refs/ tree of <name>} > 2. figure out what the “expected” names for these references will by, > by looking at remote.<name>.fetch (and at whether --tags was passed) > 3. for each “expected” name, > 1. if a tweak-fetch hook is present, call it with the > refs/real-remotes/... refname and the “expected end-name” from > remote.<name>.fetch > 2. based on the hook's result, potentially to move the “expected > end-name” to another commit than the one referenced by refs/real-remotes/... > 3. move the “expected” name to the commit referenced in > refs/real-remotes > > Which would make the tweak-fetch hook interface simpler (though more > restrictive, but I don't have any real use case for the other change > possibilities) than it is now: > 1. feed the hook with lines of > “refs/real-remotes/my-remote/heads/my-branch > refs/remotes/my-remote/my-branch” (assuming remote.my-remote.fetch is > “normal”) or “refs/real-remotes/my-remote/tags/my-tag refs/tags/my-tag” > (if my-tag is being fetched from my-remote) > 2. read lines of “<refspec> refs/remotes/my-remote/my-branch”, that > will re-point my-branch to <refspec> instead of > refs/real-remotes/my-remote/heads/my-branch. In order to avoid any > issue, the hook is not allowed to pass as second output a reference that > was not passed as second input. > > This way, the behavior of the tweak-fetch hook is reasonably preserved > (at least for my use case), and there is no additional load on the > servers thanks to the up-to-date references being stored in > refs/real-remotes/<name>/<refspec> > > Does this reasoning make any sense? > > > [1] https://marc.info/?l=git&m=132478296309094&w=2 Maybe this isn't helpful but you may be able to implement this by using a remote-helper. The helper could perform any sort of caching it needed to prevent re-downloading large amounts of data that is potentially thrown away, while only sending through the relevant commits which satisfy some criteria (signed, etc.). -- Brandon Williams ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-12 19:23 ` Fetch-hooks Brandon Williams @ 2018-02-13 15:44 ` Leo Gaspard 2018-02-14 1:38 ` Fetch-hooks Jeff King 1 sibling, 0 replies; 38+ messages in thread From: Leo Gaspard @ 2018-02-13 15:44 UTC (permalink / raw) To: Brandon Williams Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess, git, Junio C Hamano On 02/12/2018 08:23 PM, Brandon Williams wrote:> Maybe this isn't helpful but you may be able to implement this by using > a remote-helper. The helper could perform any sort of caching it needed > to prevent re-downloading large amounts of data that is potentially > thrown away, while only sending through the relevant commits which > satisfy some criteria (signed, etc.). This looks like a possibility, thanks! I'll likely try to implement it this way if there can be no consensus on the features wanted for a fetch-hook (as the current state of discussion looks like to me). ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-12 19:23 ` Fetch-hooks Brandon Williams 2018-02-13 15:44 ` Fetch-hooks Leo Gaspard @ 2018-02-14 1:38 ` Jeff King 1 sibling, 0 replies; 38+ messages in thread From: Jeff King @ 2018-02-14 1:38 UTC (permalink / raw) To: Brandon Williams Cc: Leo Gaspard, Ævar Arnfjörð Bjarmason, Joey Hess, git, Junio C Hamano On Mon, Feb 12, 2018 at 11:23:27AM -0800, Brandon Williams wrote: > Maybe this isn't helpful but you may be able to implement this by using > a remote-helper. The helper could perform any sort of caching it needed > to prevent re-downloading large amounts of data that is potentially > thrown away, while only sending through the relevant commits which > satisfy some criteria (signed, etc.). Interesting idea, though it would still be calling git-fetch under the hood, right? So I guess we're just reimplementing this "quarantine" concept, but in theory we'd have the flexibility to store that quarantine in another one-off sub-repository (instead of a special ref namespace, though I suppose you could use the special ref namespace, too). I suspect it could work, but there would probably be a lot of rough edges. -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 18:36 ` Fetch-hooks Leo Gaspard 2018-02-12 19:23 ` Fetch-hooks Brandon Williams @ 2018-02-14 1:35 ` Jeff King 2018-02-14 2:02 ` Fetch-hooks Leo Gaspard 1 sibling, 1 reply; 38+ messages in thread From: Jeff King @ 2018-02-14 1:35 UTC (permalink / raw) To: Leo Gaspard Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On Sat, Feb 10, 2018 at 07:36:47PM +0100, Leo Gaspard wrote: > Hmm... would this also drown the remote.<name>.fetch map? Also, I think > this behavior could be emulated with fetch and fetchTagsTo, and it would > look like: > [remote "my-remote"] > fetch = +refs/heads/*:refs/remotes/my-remote/heads/* > fetchTagsTo = refs/remotes/my-remote/tags/* > The remaining issue being to teach the lookup side to look for tags in > all the remote-tag namespaces (and the fact it's a breaking change). Right, I think fetching into the right spots is the easy part. Designing the new lookup rules is the tricky part. If you're really interested in the gory details, here's a very old discussion on it: https://public-inbox.org/git/AANLkTi=yFwOAQMHhvLsB1_xmYOE9HHP2YB4H4TQzwwc8@mail.gmail.com/ I think there may have been some more concrete proposals after that, but that's what I was able to dig up quickly. > That said, actually I just noticed an issue in the “add a > remote.<name>.fetch option to fetch to refs/quarantine then have the > post-fetch hook do the work”: it means if I run `git pull`, then: > 1. The remote references will be pulled to refs/quarantine/... > 2. The post-fetch hook will copy the accepted ones to refs/remotes/... > 3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into > local branches... and so merge from refs/quarantine. Good point. You can't munge FETCH_HEAD by playing with refspecs. I am starting to come around to the idea that "pre-fetch" might be the best way to do what you want. Not to rewrite refs, but perhaps to simply reject them. In the same way that we allow pre-receive to reject pushed refs (both are, after all, the final check on admitting new history into the repository, just in opposite directions). > So, when thinking about it, I'm back to thinking the proper hook > interface should be the one of the tweak-fetch hook, but its > implementation should make it not go crazy on remote servers. And so > that the implementation should do all this refs/quarantine wizardry > inside git itself. So does anybody actually want to be able to adjust the refs as they pass through? It really sounds like you just want to be able to reject or not reject the fetch. And that rejecting would be the uncommon case, so it's OK to just abort the whole operation and expect the user to figure it out. -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-14 1:35 ` Fetch-hooks Jeff King @ 2018-02-14 2:02 ` Leo Gaspard 2018-02-19 21:23 ` Fetch-hooks Jeff King 0 siblings, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-14 2:02 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On 02/14/2018 02:35 AM, Jeff King wrote: > On Sat, Feb 10, 2018 at 07:36:47PM +0100, Leo Gaspard wrote: >> [...] > I think there may have been some more concrete proposals after that, but > that's what I was able to dig up quickly. Thanks! Though it looks way above my knowledge of git internals as well as time available, it looks like a great project, and I hope it someday succeeds! >> That said, actually I just noticed an issue in the “add a >> remote.<name>.fetch option to fetch to refs/quarantine then have the >> post-fetch hook do the work”: it means if I run `git pull`, then: >> 1. The remote references will be pulled to refs/quarantine/... >> 2. The post-fetch hook will copy the accepted ones to refs/remotes/... >> 3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into >> local branches... and so merge from refs/quarantine. > > Good point. You can't munge FETCH_HEAD by playing with refspecs. > > I am starting to come around to the idea that "pre-fetch" might be the > best way to do what you want. Not to rewrite refs, but perhaps to simply > reject them. In the same way that we allow pre-receive to reject pushed > refs (both are, after all, the final check on admitting new history into > the repository, just in opposite directions). > >> So, when thinking about it, I'm back to thinking the proper hook >> interface should be the one of the tweak-fetch hook, but its >> implementation should make it not go crazy on remote servers. And so >> that the implementation should do all this refs/quarantine wizardry >> inside git itself. > > So does anybody actually want to be able to adjust the refs as they pass > through? It really sounds like you just want to be able to reject or not > reject the fetch. And that rejecting would be the uncommon case, so it's > OK to just abort the whole operation and expect the user to figure it > out. This completely fits my use case (modulo the fact that it's more similar to the `update` hook than to `pre-receive` I think, as verifying the signature requires the objects to already have been downloaded), indeed, though I'm not sure it would have fit Joey's (based on my understanding, adding a merge was what was originally asked for). Actually, I'm wondering if the existing semantics of `update` could not be reused for the `pre-fetch`. Would it make sense to just call `update` during a fetch in the same way as during a receive-pack? That said it likely makes this a breaking change, though it's maybe unlikely that a repository is used both for fetching and for receive-pack'ing, it could happen. So the proposal as I understand it would currently be adding a `fetch-update` hook that does exactly the same thing as `update` but for `fetch`. This solves my use case in a nice way, though it likely doesn't solve Joey's original one (which has been taken care of by a wrapper since then). What do you all think about it? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-14 2:02 ` Fetch-hooks Leo Gaspard @ 2018-02-19 21:23 ` Jeff King 2018-02-19 22:50 ` Fetch-hooks Leo Gaspard 0 siblings, 1 reply; 38+ messages in thread From: Jeff King @ 2018-02-19 21:23 UTC (permalink / raw) To: Leo Gaspard Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On Wed, Feb 14, 2018 at 03:02:00AM +0100, Leo Gaspard wrote: > > So does anybody actually want to be able to adjust the refs as they pass > > through? It really sounds like you just want to be able to reject or not > > reject the fetch. And that rejecting would be the uncommon case, so it's > > OK to just abort the whole operation and expect the user to figure it > > out. > > This completely fits my use case (modulo the fact that it's more similar > to the `update` hook than to `pre-receive` I think, as verifying the > signature requires the objects to already have been downloaded), indeed, > though I'm not sure it would have fit Joey's (based on my understanding, > adding a merge was what was originally asked for). > > Actually, I'm wondering if the existing semantics of `update` could not > be reused for the `pre-fetch`. Would it make sense to just call `update` > during a fetch in the same way as during a receive-pack? That said it > likely makes this a breaking change, though it's maybe unlikely that a > repository is used both for fetching and for receive-pack'ing, it could > happen. > > So the proposal as I understand it would currently be adding a > `fetch-update` hook that does exactly the same thing as `update` but for > `fetch`. This solves my use case in a nice way, though it likely doesn't > solve Joey's original one (which has been taken care of by a wrapper > since then). > > What do you all think about it? I think it should be a separate hook; having an existing hook trigger in new places is likely to cause confusion and regressions. If you do go this route, please model it after "pre-receive" rather than "update". We had "update" originally but found it was too limiting for hooks to see only one ref at a time. So we introduced pre-receive. The "update" hook remains for historical reasons, but I don't think we'd want to reproduce the mistake. :) -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-19 21:23 ` Fetch-hooks Jeff King @ 2018-02-19 22:50 ` Leo Gaspard 2018-02-20 6:10 ` Fetch-hooks Jacob Keller 2018-02-20 7:42 ` Fetch-hooks Jeff King 0 siblings, 2 replies; 38+ messages in thread From: Leo Gaspard @ 2018-02-19 22:50 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On 02/19/2018 10:23 PM, Jeff King wrote: > [...] > If you do go this route, please model it after "pre-receive" rather than > "update". We had "update" originally but found it was too limiting for > hooks to see only one ref at a time. So we introduced pre-receive. The > "update" hook remains for historical reasons, but I don't think we'd > want to reproduce the mistake. :) Hmm, what bothered me with “pre-receive” was that it was an all-or-nothing decision, without the ability to allow some references through and not others. Is there a way for “pre-receive” to individually filter hooks? I was under the impression that the only way to do that was to use the “update” hook, which was the reason I wanted to model it after “update” rather than “pre-receive” (my use case being a check independent for each pushed ref) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-19 22:50 ` Fetch-hooks Leo Gaspard @ 2018-02-20 6:10 ` Jacob Keller 2018-02-20 7:42 ` Fetch-hooks Jeff King 1 sibling, 0 replies; 38+ messages in thread From: Jacob Keller @ 2018-02-20 6:10 UTC (permalink / raw) To: Leo Gaspard Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess, Git mailing list, Brandon Williams, Junio C Hamano On Mon, Feb 19, 2018 at 2:50 PM, Leo Gaspard <leo@gaspard.io> wrote: > On 02/19/2018 10:23 PM, Jeff King wrote: >> [...] >> If you do go this route, please model it after "pre-receive" rather than >> "update". We had "update" originally but found it was too limiting for >> hooks to see only one ref at a time. So we introduced pre-receive. The >> "update" hook remains for historical reasons, but I don't think we'd >> want to reproduce the mistake. :) > > Hmm, what bothered me with “pre-receive” was that it was an > all-or-nothing decision, without the ability to allow some references > through and not others. > > Is there a way for “pre-receive” to individually filter hooks? I was > under the impression that the only way to do that was to use the > “update” hook, which was the reason I wanted to model it after “update” > rather than “pre-receive” (my use case being a check independent for > each pushed ref) At least in the "push" case I think there is value in saying "if one ref fails, please fail the entire push, always". This makes sense, because if a user happens to violate some rule in one thing, they very likely may not wish any of the push to succeed, and it also creates problems with whether a push is atomic or not. For fetch, I could see either method being useful. Thanks, Jake ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-19 22:50 ` Fetch-hooks Leo Gaspard 2018-02-20 6:10 ` Fetch-hooks Jacob Keller @ 2018-02-20 7:42 ` Jeff King 2018-02-20 21:19 ` Fetch-hooks Leo Gaspard 1 sibling, 1 reply; 38+ messages in thread From: Jeff King @ 2018-02-20 7:42 UTC (permalink / raw) To: Leo Gaspard Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On Mon, Feb 19, 2018 at 11:50:37PM +0100, Leo Gaspard wrote: > On 02/19/2018 10:23 PM, Jeff King wrote: > > [...] > > If you do go this route, please model it after "pre-receive" rather than > > "update". We had "update" originally but found it was too limiting for > > hooks to see only one ref at a time. So we introduced pre-receive. The > > "update" hook remains for historical reasons, but I don't think we'd > > want to reproduce the mistake. :) > > Hmm, what bothered me with “pre-receive” was that it was an > all-or-nothing decision, without the ability to allow some references > through and not others. > > Is there a way for “pre-receive” to individually filter hooks? I was > under the impression that the only way to do that was to use the > “update” hook, which was the reason I wanted to model it after “update” > rather than “pre-receive” (my use case being a check independent for > each pushed ref) No, pre-receive is always atomic. However, that may actually be what you want, if the point is to keep untrusted data out of the repository. By rejecting the whole thing, we could in theory keep the objects from entering the repository at all. This is how the push side works via the "quarantine" system (which is explained in git-receive-pack(1)). Fetching doesn't currently quarantine objects, but it probably wouldn't be very hard to make it do so. -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-20 7:42 ` Fetch-hooks Jeff King @ 2018-02-20 21:19 ` Leo Gaspard 0 siblings, 0 replies; 38+ messages in thread From: Leo Gaspard @ 2018-02-20 21:19 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git, Brandon Williams, Junio C Hamano On 02/20/2018 08:42 AM, Jeff King wrote:>> [...] >> >> Is there a way for “pre-receive” to individually filter [refs]? I was >> under the impression that the only way to do that was to use the >> “update” hook, which was the reason I wanted to model it after “update” >> rather than “pre-receive” (my use case being a check independent for >> each pushed ref) > > No, pre-receive is always atomic. However, that may actually be what you > want, if the point is to keep untrusted data out of the repository. By > rejecting the whole thing, we could in theory keep the objects from > entering the repository at all. This is how the push side works via the > "quarantine" system (which is explained in git-receive-pack(1)). > Fetching doesn't currently quarantine objects, but it probably wouldn't > be very hard to make it do so. Oh, I didn't think about quarantining behavior, indeed! So I guess, following your answer as well as Jake's, I'll try to implement a pre-receive-like hook, and will come back to this list when I'll have a tentative implementation. Thanks for the advice! :) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-10 12:21 ` Fetch-hooks Jeff King 2018-02-10 18:36 ` Fetch-hooks Leo Gaspard @ 2018-02-14 1:46 ` Jacob Keller 1 sibling, 0 replies; 38+ messages in thread From: Jacob Keller @ 2018-02-14 1:46 UTC (permalink / raw) To: Jeff King Cc: Leo Gaspard, Ævar Arnfjörð Bjarmason, Joey Hess, Git mailing list, Brandon Williams, Junio C Hamano On Sat, Feb 10, 2018 at 4:21 AM, Jeff King <peff@peff.net> wrote: > On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote: > >> > Yeah, tag-following may be a little tricky, because it usually wants to >> > write to refs/tags/. One workaround would be to have your config look >> > like this: >> > >> > [remote "origin"] >> > fetch = +refs/heads/*:refs/quarantine/origin/heads/* >> > fetch = +refs/tags/*:refs/quarantine/origin/tags/* >> > tagOpt = --no-tags >> > >> > That's not exactly the same thing, because it would fetch all tags, not >> > just those that point to the history on the branches. But in most >> > repositories and workflows the distinction doesn't matter. >> >> Hmm... apart from the implementation complexity (of which I have no >> idea), is there an argument against the idea of adding a >> remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used >> every time a tag is fetched? (well, maybe not exactly similar to >> remote.<name>.fetch because we know the source is going to be >> refs/tags/*; so just having the part of .fetch past the ':' would be >> more like what's needed I guess) > > I don't think it would be too hard to implement, and is at least > logically consistent with the way we handle tags. > > If we were designing from scratch, I do think this would all be helped > immensely by having more separation of refs fetched from remotes. There > was a proposal in the v1.8 era to fetch everything for a remote, > including tags, into "refs/remotes/origin/heads/", > "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to > look for tags in each of the remote-tag namespaces. > > I still think that would be a good direction to go, but it would be a > big project (which is why the original stalled). > I want to go this direction, but the difficulty has always been limited time to actually work on such a large project. Thanks, Jake >> The issue with your solution is that if the user runs 'git fetch >> --tags', he will get the (potentially compromised) tags directly in his >> refs/tags/. > > True. > > -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-08 17:02 ` Fetch-hooks Leo Gaspard 2018-02-08 21:06 ` Fetch-hooks Ævar Arnfjörð Bjarmason @ 2018-02-09 19:12 ` Leo Gaspard 2018-02-09 20:20 ` Fetch-hooks Joey Hess 1 sibling, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-09 19:12 UTC (permalink / raw) To: Joey Hess; +Cc: Ævar Arnfjörð Bjarmason, git On 02/08/2018 06:02 PM, Leo Gaspard wrote: > On 02/08/2018 04:30 PM, Joey Hess wrote: >> [...] > > Hmm, OK, so I guess I'll try to update the patch when I get some time to > delve into git's internals, as my use case (forbidding some fetches) > couldn't afaik be covered by a wrapper hook. Joey, I just wanted to check, you did not put the Signed-off-by line in patches in https://marc.info/?l=git&m=132491485901482&w=2 Could you confirm that the patches you sent are “covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file”, so that I could send the patch I wrote based on yours with a Signed-off-by line of my own without breaking the DCO? Thanks! Leo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Fetch-hooks 2018-02-09 19:12 ` Fetch-hooks Leo Gaspard @ 2018-02-09 20:20 ` Joey Hess 2018-02-09 21:28 ` [PATCH 0/2] fetch: add tweak-fetch hook Leo Gaspard 0 siblings, 1 reply; 38+ messages in thread From: Joey Hess @ 2018-02-09 20:20 UTC (permalink / raw) To: Leo Gaspard; +Cc: Ævar Arnfjörð Bjarmason, git [-- Attachment #1: Type: text/plain, Size: 722 bytes --] Leo Gaspard wrote: > I just wanted to check, you did not put the Signed-off-by line in > patches in https://marc.info/?l=git&m=132491485901482&w=2 > > Could you confirm that the patches you sent are “covered under an > appropriate open source license and I have the right under that license > to submit that work with modifications, whether created in whole or in > part by me, under the same open source license (unless I am permitted to > submit under a different license), as indicated in the file”, so that I > could send the patch I wrote based on yours with a Signed-off-by line of > my own without breaking the DCO? Yes; my patches are under the same GPL-2 as the rest of git. -- see shy jo [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 0/2] fetch: add tweak-fetch hook 2018-02-09 20:20 ` Fetch-hooks Joey Hess @ 2018-02-09 21:28 ` Leo Gaspard 2018-02-09 21:44 ` [PATCH 1/2] fetch: preparations for " Leo Gaspard 0 siblings, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-09 21:28 UTC (permalink / raw) To: git Cc: Joey Hess, Ævar Arnfjörð Bjarmason, Junio C Hamano, Johannes Sixt On 02/09/2018 09:20 PM, Joey Hess wrote:> Yes; my patches are under the same GPL-2 as the rest of git. Thanks! So here comes my patch series, heavily based on yours. There are some things to bear in mind while reviewing it: * This is my first real attempt at contributing to git, which means I could be very very far off-track * Most of it is based on trying to make the 6-year-old patch series work and pass all the tests, so if a new feature has been added since then I likely didn't notice it or don't know how to handle it correctly There are still three TODO's in the code: * In the documentation, one stating that I don't really get what this “ignore” parameter exactly does, and whether it should be handled specially (a prime example of a new feature I'm not really sure how to handle, somewhere in the code it's written all the “ignore” references are at the end of the list, but I'm already not self-confident enough about the difference between “merge” and “not-for-merge” to even consider making a good choice about how to handle “ignore”) * In `builtins/fetch.c`, function `do_fetch`, there is a conflict of interest between placing the `prune` before the `fetch` (as done by commit 10a6cc889 ("fetch --prune: Run prune before fetching", 2014-01-02)), and placing the `fetch` before the `prune` (which would allow hooks that rename the local-ref to not be prune'd and then re-fetched when doing a `git fetch --prune` -- without that a hook that would want to both read the old commit information and rename the local-ref would not be able to). Or maybe this question means actually there should be a third solution? but I don't really know what. Maybe also hooking into the prune operation? * In `templates/hooks--tweak-fetch.sample`, the “check this actually works” todo, as I'd rather first check this patch series is not too far off-topic before doing non-essential work -- anyway another version of the patch series will be required for the other two TODO's, so I can fix it at this point. That being said, what do you think about these patches? Thanks for your time! Leo Gaspard ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 1/2] fetch: preparations for tweak-fetch hook 2018-02-09 21:28 ` [PATCH 0/2] fetch: add tweak-fetch hook Leo Gaspard @ 2018-02-09 21:44 ` Leo Gaspard 2018-02-09 21:44 ` [PATCH 2/2] fetch: add " Leo Gaspard 2018-02-09 22:34 ` [PATCH 1/2] fetch: preparations for " Junio C Hamano 0 siblings, 2 replies; 38+ messages in thread From: Leo Gaspard @ 2018-02-09 21:44 UTC (permalink / raw) To: git Cc: Joey Hess, Ævar Arnfjörð Bjarmason, Junio C Hamano, Johannes Sixt, Léo Gaspard From: Léo Gaspard <leo@gaspard.io> No behavior changes yet, only some groundwork for the next change. The refs_result structure combines a status code with a ref map, which can be NULL even on success. This will be needed when there's a tweak-fetch hook, because it can filter out all refs, while still succeeding. fetch_refs returns a refs_result, so that it can modify the ref_map. Based-on-patch-by: Joey Hess <joey@kitenet.net> Signed-off-by: Leo Gaspard <leo@gaspard.io> --- builtin/fetch.c | 68 +++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 44 insertions(+), 24 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 7bbcd26fa..76dc05f61 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -34,6 +34,11 @@ enum { TAGS_SET = 2 }; +struct refs_result { + struct ref *new_refs; + int status; +}; + static int fetch_prune_config = -1; /* unspecified */ static int prune = -1; /* unspecified */ #define PRUNE_BY_DEFAULT 0 /* do we prune by default? */ @@ -57,6 +62,18 @@ static int shown_url = 0; static int refmap_alloc, refmap_nr; static const char **refmap_array; +static int add_existing(const char *refname, const struct object_id *oid, + int flag, void *cbdata) +{ + struct string_list *list = (struct string_list *)cbdata; + struct string_list_item *item = string_list_insert(list, refname); + struct object_id *old_oid = xmalloc(sizeof(*old_oid)); + + oidcpy(old_oid, oid); + item->util = old_oid; + return 0; +} + static int git_fetch_config(const char *k, const char *v, void *cb) { if (!strcmp(k, "fetch.prune")) { @@ -217,18 +234,6 @@ static void add_merge_config(struct ref **head, } } -static int add_existing(const char *refname, const struct object_id *oid, - int flag, void *cbdata) -{ - struct string_list *list = (struct string_list *)cbdata; - struct string_list_item *item = string_list_insert(list, refname); - struct object_id *old_oid = xmalloc(sizeof(*old_oid)); - - oidcpy(old_oid, oid); - item->util = old_oid; - return 0; -} - static int will_fetch(struct ref **head, const unsigned char *sha1) { struct ref *rm = *head; @@ -920,15 +925,20 @@ static int quickfetch(struct ref *ref_map) return check_connected(iterate_ref_map, &rm, &opt); } -static int fetch_refs(struct transport *transport, struct ref *ref_map) +static struct refs_result fetch_refs(struct transport *transport, + struct ref *ref_map) { - int ret = quickfetch(ref_map); - if (ret) - ret = transport_fetch_refs(transport, ref_map); - if (!ret) - ret |= store_updated_refs(transport->url, + struct refs_result ret; + ret.status = quickfetch(ref_map); + if (ret.status) { + ret.status = transport_fetch_refs(transport, ref_map); + } + if (!ret.status) { + ret.new_refs = ref_map; + ret.status |= store_updated_refs(transport->url, transport->remote->name, - ref_map); + ret.new_refs); + } transport_unlock_pack(transport); return ret; } @@ -1048,9 +1058,11 @@ static struct transport *prepare_transport(struct remote *remote, int deepen) return transport; } -static void backfill_tags(struct transport *transport, struct ref *ref_map) +static struct refs_result backfill_tags(struct transport *transport, + struct ref *ref_map) { int cannot_reuse; + struct refs_result res; /* * Once we have set TRANS_OPT_DEEPEN_SINCE, we can't unset it @@ -1069,12 +1081,14 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map) transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL); transport_set_option(transport, TRANS_OPT_DEPTH, "0"); transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL); - fetch_refs(transport, ref_map); + res = fetch_refs(transport, ref_map); if (gsecondary) { transport_disconnect(gsecondary); gsecondary = NULL; } + + return res; } static int do_fetch(struct transport *transport, @@ -1083,6 +1097,7 @@ static int do_fetch(struct transport *transport, struct string_list existing_refs = STRING_LIST_INIT_DUP; struct ref *ref_map; struct ref *rm; + struct refs_result res; int autotags = (transport->remote->fetch_tags == 1); int retcode = 0; @@ -1135,7 +1150,10 @@ static int do_fetch(struct transport *transport, transport->url); } } - if (fetch_refs(transport, ref_map)) { + + res = fetch_refs(transport, ref_map); + ref_map = res.new_refs; + if (res.status) { free_refs(ref_map); retcode = 1; goto cleanup; @@ -1148,8 +1166,10 @@ static int do_fetch(struct transport *transport, struct ref **tail = &ref_map; ref_map = NULL; find_non_local_tags(transport, &ref_map, &tail); - if (ref_map) - backfill_tags(transport, ref_map); + if (ref_map) { + res = backfill_tags(transport, ref_map); + ref_map = res.new_refs; + } free_refs(ref_map); } -- 2.16.1 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 2/2] fetch: add tweak-fetch hook 2018-02-09 21:44 ` [PATCH 1/2] fetch: preparations for " Leo Gaspard @ 2018-02-09 21:44 ` Leo Gaspard 2018-02-09 22:40 ` Junio C Hamano 2018-02-09 22:34 ` [PATCH 1/2] fetch: preparations for " Junio C Hamano 1 sibling, 1 reply; 38+ messages in thread From: Leo Gaspard @ 2018-02-09 21:44 UTC (permalink / raw) To: git Cc: Joey Hess, Ævar Arnfjörð Bjarmason, Junio C Hamano, Johannes Sixt, Léo Gaspard From: Léo Gaspard <leo@gaspard.io> The tweak-fetch hook is fed lines on stdin for all refs that were fetched, and outputs on stdout possibly modified lines. Its output is then parsed and used when `git fetch` updates the remote tracking refs, records the entries in FETCH_HEAD, and produces its report. The modifications here are heavily based on prior work by Joey Hess. Based-on-patch-by: Joey Hess <joey@kitenet.net> Signed-off-by: Leo Gaspard <leo@gaspard.io> --- Documentation/githooks.txt | 37 +++++++ builtin/fetch.c | 210 +++++++++++++++++++++++++++++++++++- t/t5574-fetch-tweak-fetch-hook.sh | 90 ++++++++++++++++ templates/hooks--tweak-fetch.sample | 24 +++++ 4 files changed, 359 insertions(+), 2 deletions(-) create mode 100755 t/t5574-fetch-tweak-fetch-hook.sh create mode 100755 templates/hooks--tweak-fetch.sample diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index f877f7b7c..1b4a18bf0 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -177,6 +177,43 @@ This hook can be used to perform repository validity checks, auto-display differences from the previous HEAD if different, or set working dir metadata properties. +tweak-fetch +~~~~~~~~~~~ + +This hook is invoked by 'git fetch' (commonly called by 'git pull'), after refs +have been fetched from the remote repository. It is not executed, if nothing was +fetched. + +The output of the hook is used to update the remote-tracking branches, and +`.git/FETCH_HEAD`, in preparation for a later merge operation done by 'git +merge'. + +It takes no arguments, but is fed a line of the following format on its standard +input for each ref that was fetched. + + <sha1> SP not-for-merge|merge|ignore SP <remote-refname> SP <local-refname> LF + +Where the "not-for-merge" flag indicates the ref is not to be merged into the +current branch, and the "merge" flag indicates that 'git merge' should later +merge it. + +The `<remote-refname>` is the remote's name for the ref that was fetched, and +`<local-refname>` is a name of a remote-tracking branch, like +"refs/remotes/origin/master". `<local-refname>` can be undefined if the fetched +ref is not being stored in a local refname. In this case, it will be set to `@`, +an invalide refspec, so that scripts can be written more easily. + +TODO: Add documentation for the “ignore” parameter. Unfortunately, I'm not +really sure I get what this does or what invariants it is supposed to maintain +(eg. all “ignore” updates at the end of the refs list?), so this may also +require code changes. + +The hook must consume all of its standard input, and output back lines of the +same format. It can modify its input as desired, including adding or removing +lines, updating the sha1 (i.e. re-point the remote-tracking branch), changing +the merge flag, and changing the `<local-refname>` (i.e. use different +remote-tracking branch). + post-merge ~~~~~~~~~~ diff --git a/builtin/fetch.c b/builtin/fetch.c index 76dc05f61..1bb394530 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -28,6 +28,8 @@ static const char * const builtin_fetch_usage[] = { NULL }; +static const char tweak_fetch_hook[] = "tweak-fetch"; + enum { TAGS_UNSET = 0, TAGS_DEFAULT = 1, @@ -181,6 +183,206 @@ static struct option builtin_fetch_options[] = { OPT_END() }; +static int feed_tweak_fetch_hook(int in, int out, void *data) +{ + struct ref *ref; + struct strbuf buf = STRBUF_INIT; + const char *kw, *peer_ref; + char oid_buf[GIT_SHA1_HEXSZ + 1]; + int ret; + + for (ref = data; ref; ref = ref->next) { + if (ref->fetch_head_status == FETCH_HEAD_MERGE) + kw = "merge"; + else if (ref->fetch_head_status == FETCH_HEAD_IGNORE) + kw = "ignore"; + else + kw = "not-for-merge"; + if (!ref->name) + die("trying to fetch an inexistant ref"); + if (ref->peer_ref && ref->peer_ref->name) + peer_ref = ref->peer_ref->name; + else + peer_ref = "@"; + strbuf_addf(&buf, "%s %s %s %s\n", + oid_to_hex_r(oid_buf, &ref->old_oid), kw, + ref->name, peer_ref); + } + + ret = write_in_full(out, buf.buf, buf.len) != buf.len; + if (ret) + warning("%s hook failed to consume all its input", + tweak_fetch_hook); + close(out); + strbuf_release(&buf); + return ret; +} + +static struct ref *parse_tweak_fetch_hook_line(char *l, + struct string_list *existing_refs) +{ + struct ref *ref = NULL, *peer_ref = NULL; + struct string_list_item *peer_item = NULL; + char *words[4]; + int i, word = 0; + char *problem; + + for (i = 0; l[i]; i++) { + if (isspace(l[i])) { + l[i] = '\0'; + words[word] = l; + l += i + 1; + i = 0; + word++; + if (word > 3) { + problem = "too many words"; + goto unparsable; + } + } + } + if (word < 3) { + problem = "not enough words"; + goto unparsable; + } + + ref = alloc_ref(words[2]); + peer_ref = ref->peer_ref = alloc_ref(l); + ref->peer_ref->force = 1; + + if (get_oid_hex(words[0], &ref->old_oid)) { + problem="bad oid"; + goto unparsable; + } + + if (!strcmp(words[1], "merge")) { + ref->fetch_head_status = FETCH_HEAD_MERGE; + } else if (!strcmp(words[1], "ignore")) { + ref->fetch_head_status = FETCH_HEAD_IGNORE; + } else if (!strcmp(words[1], "not-for-merge")) { + ref->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE; + } else { + problem = "bad merge flag"; + goto unparsable; + } + + peer_item = string_list_lookup(existing_refs, peer_ref->name); + if (peer_item) + hashcpy(peer_ref->old_oid.hash, peer_item->util); + + return ref; + +unparsable: + warning("%s hook output a wrongly formed line: %s", + tweak_fetch_hook, problem); + free(ref); + free(peer_ref); + return NULL; +} + +static struct refs_result read_tweak_fetch_hook(int in) +{ + struct refs_result res; + FILE *f; + struct strbuf buf; + struct string_list existing_refs = STRING_LIST_INIT_DUP; + struct ref *ref, *prevref = NULL; + + res.status = 0; + res.new_refs = NULL; + + f = fdopen(in, "r"); + if (f == NULL) { + res.status = 1; + return res; + } + + strbuf_init(&buf, 128); + for_each_ref(add_existing, &existing_refs); + + while (strbuf_getline(&buf, f) != EOF) { + char *l = strbuf_detach(&buf, NULL); + ref = parse_tweak_fetch_hook_line(l, &existing_refs); + if (!ref) { + res.status = 1; + } else { + if (prevref) { + prevref->next = ref; + prevref = ref; + } else { + res.new_refs = prevref = ref; + } + } + free(l); + } + + string_list_clear(&existing_refs, 0); + strbuf_release(&buf); + fclose(f); + return res; +} + +/* + * The hook is fed lines of the form: + * <sha1> SP <not-for-merge|merge|ignore> SP <remote-refname> SP <local-refname> LF + * And should output rewritten lines of the same form. + */ +static struct ref *run_tweak_fetch_hook(struct ref *fetched_refs) +{ + struct child_process hook; + const char *argv[2]; + struct async async; + struct refs_result res; + + if (!fetched_refs) + return fetched_refs; + + argv[0] = find_hook(tweak_fetch_hook); + if (access(argv[0], X_OK) < 0) + return fetched_refs; + argv[1] = NULL; + + memset(&hook, 0, sizeof(hook)); + hook.argv = argv; + hook.in = -1; + hook.out = -1; + if (start_command(&hook)) + return fetched_refs; + + /* + * Use an async writer to feed the hook process. + * This allows the hook to read and write a line at + * a time without blocking. + */ + memset(&async, 0, sizeof(async)); + async.proc = feed_tweak_fetch_hook; + async.data = fetched_refs; + async.out = hook.in; + if (start_async(&async)) { + close(hook.in); + close(hook.out); + finish_command(&hook); + return fetched_refs; + } + + res = read_tweak_fetch_hook(hook.out); + res.status |= finish_async(&async); + res.status |= finish_command(&hook); + + if (res.status) { + warning("%s hook failed, ignoring its output", tweak_fetch_hook); + free(res.new_refs); + return fetched_refs; + } else { + /* + * The new_refs are returned, to be used in place of + * fetched_refs, so it is not needed anymore and can + * be freed here. + */ + free_refs(fetched_refs); + return res.new_refs; + } +} + static void unlock_pack(void) { if (gtransport) @@ -934,7 +1136,7 @@ static struct refs_result fetch_refs(struct transport *transport, ret.status = transport_fetch_refs(transport, ref_map); } if (!ret.status) { - ret.new_refs = ref_map; + ret.new_refs = run_tweak_fetch_hook(ref_map); ret.status |= store_updated_refs(transport->url, transport->remote->name, ret.new_refs); @@ -1150,7 +1352,11 @@ static int do_fetch(struct transport *transport, transport->url); } } - + // TODO(?): Were this placed above the `if (prune)`, it would avoid the + // unfortunate fact that `git fetch --prune` first drops the ref then + // re-adds it (in cases where the tweak-fetch hook renames it). There is + // likely a better solution than this one that would break Commit + // 10a6cc889 ("fetch --prune: Run prune before fetching", 2014-01-02) res = fetch_refs(transport, ref_map); ref_map = res.new_refs; if (res.status) { diff --git a/t/t5574-fetch-tweak-fetch-hook.sh b/t/t5574-fetch-tweak-fetch-hook.sh new file mode 100755 index 000000000..17cf52684 --- /dev/null +++ b/t/t5574-fetch-tweak-fetch-hook.sh @@ -0,0 +1,90 @@ +#!/bin/sh + +test_description='testing tweak-fetch-hook' +. ./test-lib.sh + +HOOKDIR="$(git rev-parse --git-dir)/hooks" +HOOK="$HOOKDIR/tweak-fetch" +mkdir -p "$HOOKDIR" + +# Setup +test_expect_success 'setup' ' + git init parent-repo && + git remote add parent parent-repo && + (cd parent-repo && test_commit commit-100) && + git fetch parent && + git tag | grep -E "^commit-100$" +' + +# No-effect hook +write_script "$HOOK" <<EOF +cat +EOF +test_expect_success 'no-op hook' ' + (cd parent-repo && test_commit commit-200) && + git fetch parent && + git tag | grep -E "^commit-200$" +' + +# Ref-renaming hook +write_script "$HOOK" <<EOF +sed 's/commit-/tag-/g' +EOF +test_expect_success 'ref-renaming hook' ' + (cd parent-repo && test_commit commit-300) && + git fetch parent && + git tag | grep -E "^tag-300" && + ! git tag | grep -E "^commit-300" +' + +# Drop branch +write_script "$HOOK" <<EOF +cat +EOF +test_expect_success 'dropping hook setup' ' + (cd parent-repo && test_commit commit-400) && + git fetch parent && + test "$(git rev-parse parent/master)" = "$(git rev-parse commit-400)" +' +write_script "$HOOK" <<EOF +grep -v 'refs/remotes/parent/master' +exit 0 +EOF +test_expect_success 'dropping hook' ' + (cd parent-repo && test_commit commit-401) && + git fetch parent && + test "$(git rev-parse parent/master)" = "$(git rev-parse commit-400)" && + chmod -x "'"$HOOK"'" && + git fetch parent && + test "$(git rev-parse parent/master)" = "$(git rev-parse commit-401)" +' + +# Repointing hook +write_script "$HOOK" <<EOF +cat +EOF +test_expect_success 'repointing hook setup' ' + (cd parent-repo && test_commit commit-500) && + git fetch parent +' +write_script "$HOOK" <<'EOF' +while read hash merge remote_ref local_ref; do + if [ "$local_ref" = "refs/remotes/parent/master" ]; then + repointed="$(git rev-parse "$hash^")" + echo "$repointed $merge $remote_ref $local_ref" + else + echo "$hash $merge $remote_ref $local_ref" + fi +done +exit 0 +EOF +test_expect_success 'repointing hook' ' + (cd parent-repo && test_commit commit-501 && test_commit commit-502) && + git fetch parent && + test "$(git rev-parse parent/master)" = "$(git rev-parse commit-501)" && + (cd parent-repo && test_commit commit-503) && + git fetch parent && + test "$(git rev-parse parent/master)" = "$(git rev-parse commit-502)" +' + +test_done diff --git a/templates/hooks--tweak-fetch.sample b/templates/hooks--tweak-fetch.sample new file mode 100755 index 000000000..93b86ad2f --- /dev/null +++ b/templates/hooks--tweak-fetch.sample @@ -0,0 +1,24 @@ +#!/bin/sh +# +# Copyright (c) 2018 Leo Gaspard +# +# The "tweak-fetch" hook is run during the fetching process. It is called with +# no parameters. Its communication protocol is reading fetched references on +# stdin, and outputting references to update on stdout, with the same protocol +# described in `git help hooks`. +# +# This sample shows how to refuse fetching any unsigned commit. + +while read hash merge remote_ref local_ref; do + allowed_commit="$(git rev-parse "$local_ref")" + git rev-list "$local_ref..$hash" | tac | while read commit; do + if git verify-commit "$commit" > /dev/null 2>&1; then + allowed_commit="$commit" + else + echo "Commit '$commit' is not signed! Refusing to fetch past it" >&2 + break + fi + done + echo "$allowed_commit $merge $remote_ref $local_ref" +done +# TODO: actually verify this hook works -- 2.16.1 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] fetch: add tweak-fetch hook 2018-02-09 21:44 ` [PATCH 2/2] fetch: add " Leo Gaspard @ 2018-02-09 22:40 ` Junio C Hamano 0 siblings, 0 replies; 38+ messages in thread From: Junio C Hamano @ 2018-02-09 22:40 UTC (permalink / raw) To: Leo Gaspard Cc: git, Joey Hess, Ævar Arnfjörð Bjarmason, Johannes Sixt Leo Gaspard <leo@gaspard.io> writes: > +tweak-fetch > +~~~~~~~~~~~ > + > +This hook is invoked by 'git fetch' (commonly called by 'git pull'), after refs > +have been fetched from the remote repository. It is not executed, if nothing was > +fetched. Need to tighten explanation of "nothing was fetched". If the only change I made to my repository is that I created a new branch that points at an existing object since last time you fetched, you would obtain no new object when you fetch from me. Would that count as "nothing was fetched"? Or would it be still fetching something (i.e. your remote-tracking hierarchy will record the fact that I now have this new branch)? > + <sha1> SP not-for-merge|merge|ignore SP <remote-refname> SP <local-refname> LF > + ... > +The `<remote-refname>` is the remote's name for the ref that was fetched, and > +`<local-refname>` is a name of a remote-tracking branch, like > +"refs/remotes/origin/master". `<local-refname>` can be undefined if the fetched > +ref is not being stored in a local refname. In this case, it will be set to `@`, Don't use "@"; leave it empty instead. > +TODO: Add documentation for the “ignore” parameter. Unfortunately, I'm not > +really sure I get what this does or what invariants it is supposed to maintain > +(eg. all “ignore” updates at the end of the refs list?), so this may also > +require code changes. If you are not using the feature, wouldn't it make more sense not to add it in the first place? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] fetch: preparations for tweak-fetch hook 2018-02-09 21:44 ` [PATCH 1/2] fetch: preparations for " Leo Gaspard 2018-02-09 21:44 ` [PATCH 2/2] fetch: add " Leo Gaspard @ 2018-02-09 22:34 ` Junio C Hamano 1 sibling, 0 replies; 38+ messages in thread From: Junio C Hamano @ 2018-02-09 22:34 UTC (permalink / raw) To: Leo Gaspard Cc: git, Joey Hess, Ævar Arnfjörð Bjarmason, Johannes Sixt Leo Gaspard <leo@gaspard.io> writes: > From: Léo Gaspard <leo@gaspard.io> > > No behavior changes yet, only some groundwork for the next change. > > The refs_result structure combines a status code with a ref map, which > can be NULL even on success. Sorry, but I have absolutely no idea what this sentence wants to do by being here in the log message. "even on success" makes it sound as if it is normal to be NULL when status code != success, but it is not even clear why it is the case, and "can be NULL" implies that it is not always NULL, but it does not make it clear when it is NULL and when it is not NULL when code == success. If you find the need to explain what each field of a new struct you are introducing to the codebase means (and it probably is a good idea for this one), perhaps the right place to do so is in in-code comment next to the struct definition? It seems that you also moved add_existing() in the file, for no apparent reason. > This will be needed when there's a > tweak-fetch hook, because it can filter out all refs, while still > succeeding. > > fetch_refs returns a refs_result, so that it can modify the ref_map. The description keeps saying "ref map", but it is unclear what it is, why it is a good thing to allow the caller "modify" it, and what kind of modification the caller wants to make for what reason. Also, in C, we tend to avoid passing structure by value. It seems that the patch makes a callchain involving backfill_tags() return this struct by value, which goes against the convention. I suspect that it should be trivial to have the caller supply an on-stack instance and pass the pointer to it down the callchain, though. > Based-on-patch-by: Joey Hess <joey@kitenet.net> > Signed-off-by: Leo Gaspard <leo@gaspard.io> > --- > builtin/fetch.c | 68 +++++++++++++++++++++++++++++++++++++-------------------- > 1 file changed, 44 insertions(+), 24 deletions(-) > > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 7bbcd26fa..76dc05f61 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -34,6 +34,11 @@ enum { > TAGS_SET = 2 > }; > > +struct refs_result { > + struct ref *new_refs; > + int status; > +}; > + > static int fetch_prune_config = -1; /* unspecified */ > static int prune = -1; /* unspecified */ > #define PRUNE_BY_DEFAULT 0 /* do we prune by default? */ > @@ -57,6 +62,18 @@ static int shown_url = 0; > static int refmap_alloc, refmap_nr; > static const char **refmap_array; > > +static int add_existing(const char *refname, const struct object_id *oid, > + int flag, void *cbdata) > +{ > + struct string_list *list = (struct string_list *)cbdata; > + struct string_list_item *item = string_list_insert(list, refname); > + struct object_id *old_oid = xmalloc(sizeof(*old_oid)); > + > + oidcpy(old_oid, oid); > + item->util = old_oid; > + return 0; > +} > + > static int git_fetch_config(const char *k, const char *v, void *cb) > { > if (!strcmp(k, "fetch.prune")) { > @@ -217,18 +234,6 @@ static void add_merge_config(struct ref **head, > } > } > > -static int add_existing(const char *refname, const struct object_id *oid, > - int flag, void *cbdata) > -{ > - struct string_list *list = (struct string_list *)cbdata; > - struct string_list_item *item = string_list_insert(list, refname); > - struct object_id *old_oid = xmalloc(sizeof(*old_oid)); > - > - oidcpy(old_oid, oid); > - item->util = old_oid; > - return 0; > -} > - > static int will_fetch(struct ref **head, const unsigned char *sha1) > { > struct ref *rm = *head; > @@ -920,15 +925,20 @@ static int quickfetch(struct ref *ref_map) > return check_connected(iterate_ref_map, &rm, &opt); > } > > -static int fetch_refs(struct transport *transport, struct ref *ref_map) > +static struct refs_result fetch_refs(struct transport *transport, > + struct ref *ref_map) > { > - int ret = quickfetch(ref_map); > - if (ret) > - ret = transport_fetch_refs(transport, ref_map); > - if (!ret) > - ret |= store_updated_refs(transport->url, > + struct refs_result ret; > + ret.status = quickfetch(ref_map); > + if (ret.status) { > + ret.status = transport_fetch_refs(transport, ref_map); > + } > + if (!ret.status) { > + ret.new_refs = ref_map; > + ret.status |= store_updated_refs(transport->url, > transport->remote->name, > - ref_map); > + ret.new_refs); > + } > transport_unlock_pack(transport); > return ret; > } > @@ -1048,9 +1058,11 @@ static struct transport *prepare_transport(struct remote *remote, int deepen) > return transport; > } > > -static void backfill_tags(struct transport *transport, struct ref *ref_map) > +static struct refs_result backfill_tags(struct transport *transport, > + struct ref *ref_map) > { > int cannot_reuse; > + struct refs_result res; > > /* > * Once we have set TRANS_OPT_DEEPEN_SINCE, we can't unset it > @@ -1069,12 +1081,14 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map) > transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL); > transport_set_option(transport, TRANS_OPT_DEPTH, "0"); > transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL); > - fetch_refs(transport, ref_map); > + res = fetch_refs(transport, ref_map); > > if (gsecondary) { > transport_disconnect(gsecondary); > gsecondary = NULL; > } > + > + return res; > } > > static int do_fetch(struct transport *transport, > @@ -1083,6 +1097,7 @@ static int do_fetch(struct transport *transport, > struct string_list existing_refs = STRING_LIST_INIT_DUP; > struct ref *ref_map; > struct ref *rm; > + struct refs_result res; > int autotags = (transport->remote->fetch_tags == 1); > int retcode = 0; > > @@ -1135,7 +1150,10 @@ static int do_fetch(struct transport *transport, > transport->url); > } > } > - if (fetch_refs(transport, ref_map)) { > + > + res = fetch_refs(transport, ref_map); > + ref_map = res.new_refs; > + if (res.status) { > free_refs(ref_map); > retcode = 1; > goto cleanup; > @@ -1148,8 +1166,10 @@ static int do_fetch(struct transport *transport, > struct ref **tail = &ref_map; > ref_map = NULL; > find_non_local_tags(transport, &ref_map, &tail); > - if (ref_map) > - backfill_tags(transport, ref_map); > + if (ref_map) { > + res = backfill_tags(transport, ref_map); > + ref_map = res.new_refs; > + } > free_refs(ref_map); > } ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2018-02-20 21:19 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-02-07 21:56 Fetch-hooks Leo Gaspard 2018-02-07 22:51 ` Fetch-hooks Ævar Arnfjörð Bjarmason 2018-02-08 0:06 ` Fetch-hooks Leo Gaspard 2018-02-08 15:30 ` Fetch-hooks Joey Hess 2018-02-08 17:02 ` Fetch-hooks Leo Gaspard 2018-02-08 21:06 ` Fetch-hooks Ævar Arnfjörð Bjarmason 2018-02-08 22:18 ` Fetch-hooks Leo Gaspard 2018-02-09 22:04 ` Fetch-hooks Ævar Arnfjörð Bjarmason 2018-02-09 22:24 ` Fetch-hooks Leo Gaspard 2018-02-09 22:56 ` Fetch-hooks Ævar Arnfjörð Bjarmason 2018-02-09 22:30 ` Fetch-hooks Jeff King 2018-02-09 22:45 ` Fetch-hooks Junio C Hamano 2018-02-09 23:49 ` Fetch-hooks Leo Gaspard 2018-02-10 0:13 ` Fetch-hooks Jeff King 2018-02-10 0:37 ` Fetch-hooks Leo Gaspard 2018-02-10 1:08 ` Fetch-hooks Junio C Hamano 2018-02-10 1:33 ` Fetch-hooks Leo Gaspard 2018-02-10 18:03 ` Fetch-hooks Leo Gaspard 2018-02-10 12:21 ` Fetch-hooks Jeff King 2018-02-10 18:36 ` Fetch-hooks Leo Gaspard 2018-02-12 19:23 ` Fetch-hooks Brandon Williams 2018-02-13 15:44 ` Fetch-hooks Leo Gaspard 2018-02-14 1:38 ` Fetch-hooks Jeff King 2018-02-14 1:35 ` Fetch-hooks Jeff King 2018-02-14 2:02 ` Fetch-hooks Leo Gaspard 2018-02-19 21:23 ` Fetch-hooks Jeff King 2018-02-19 22:50 ` Fetch-hooks Leo Gaspard 2018-02-20 6:10 ` Fetch-hooks Jacob Keller 2018-02-20 7:42 ` Fetch-hooks Jeff King 2018-02-20 21:19 ` Fetch-hooks Leo Gaspard 2018-02-14 1:46 ` Fetch-hooks Jacob Keller 2018-02-09 19:12 ` Fetch-hooks Leo Gaspard 2018-02-09 20:20 ` Fetch-hooks Joey Hess 2018-02-09 21:28 ` [PATCH 0/2] fetch: add tweak-fetch hook Leo Gaspard 2018-02-09 21:44 ` [PATCH 1/2] fetch: preparations for " Leo Gaspard 2018-02-09 21:44 ` [PATCH 2/2] fetch: add " Leo Gaspard 2018-02-09 22:40 ` Junio C Hamano 2018-02-09 22:34 ` [PATCH 1/2] fetch: preparations for " Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).