git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* Fetch-hooks
@ 2018-02-07 21:56 Leo Gaspard
  2018-02-07 22:51 ` Fetch-hooks Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-07 21:56 UTC (permalink / raw)
  To: git

Hello,

tl;dr: Is there currently a way to have fetch hooks, and if not do you
think it could be a nice feature?

I was in the process of implementing hooks for git that ensure the
repository is always cleanly signed by someone allowed to by the
repository itself. I think I've completed the signature-checking part
[1] and the push hook [2] (even though it isn't really configurable at
the moment).

However, I was starting to think about handling the fetch step, and
couldn't find any fetch hook. Is there one?

If not, would you think it is would be a good idea to add one, that
would eg. be passed the commit-before, commit-after and could block the
changing of the reference if it failed?

The only other solution I could think of is using a separate script for
fetching, but that would be fragile, as the user could always not think
about it well and run a git fetch, breaking the objective that after the
first clone all commits were correctly signature-checked.

Thanks for reading me!
Leo

PS1: I am not subscribed to the ML.

PS2: I've tried asking freenode#git, without success so far.


[1]
https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh

[2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-07 21:56 Fetch-hooks Leo Gaspard
@ 2018-02-07 22:51 ` Ævar Arnfjörð Bjarmason
  2018-02-08  0:06   ` Fetch-hooks Leo Gaspard
  0 siblings, 1 reply; 38+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-07 22:51 UTC (permalink / raw)
  To: Leo Gaspard; +Cc: git


On Wed, Feb 07 2018, Leo Gaspard jotted:

> Hello,
>
> tl;dr: Is there currently a way to have fetch hooks, and if not do you
> think it could be a nice feature?
>
> I was in the process of implementing hooks for git that ensure the
> repository is always cleanly signed by someone allowed to by the
> repository itself. I think I've completed the signature-checking part
> [1] and the push hook [2] (even though it isn't really configurable at
> the moment).
>
> However, I was starting to think about handling the fetch step, and
> couldn't find any fetch hook. Is there one?
>
> If not, would you think it is would be a good idea to add one, that
> would eg. be passed the commit-before, commit-after and could block the
> changing of the reference if it failed?
>
> The only other solution I could think of is using a separate script for
> fetching, but that would be fragile, as the user could always not think
> about it well and run a git fetch, breaking the objective that after the
> first clone all commits were correctly signature-checked.
>
> Thanks for reading me!
> Leo
>
> PS1: I am not subscribed to the ML.
>
> PS2: I've tried asking freenode#git, without success so far.
>
>
> [1]
> https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh
>
> [2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push

There is no fetch hook, however you may find that the
post-{checkout,merge} hooks are suitable for what you want to do.

Setting those to some custom comand is a common pattern for
e.g. compiling some assets on "git pull", so you could similarly check
the commits from HEAD, of course those are post-* hooks, so they won't
stop the checkout.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-07 22:51 ` Fetch-hooks Ævar Arnfjörð Bjarmason
@ 2018-02-08  0:06   ` Leo Gaspard
  2018-02-08 15:30     ` Fetch-hooks Joey Hess
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-08  0:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, joey

On 02/07/2018 11:51 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 07 2018, Leo Gaspard jotted:
> 
>> Hello,
>>
>> tl;dr: Is there currently a way to have fetch hooks, and if not do you
>> think it could be a nice feature?
>>
>> I was in the process of implementing hooks for git that ensure the
>> repository is always cleanly signed by someone allowed to by the
>> repository itself. I think I've completed the signature-checking part
>> [1] and the push hook [2] (even though it isn't really configurable at
>> the moment).
>>
>> However, I was starting to think about handling the fetch step, and
>> couldn't find any fetch hook. Is there one?
>>
>> If not, would you think it is would be a good idea to add one, that
>> would eg. be passed the commit-before, commit-after and could block the
>> changing of the reference if it failed?
>>
>> The only other solution I could think of is using a separate script for
>> fetching, but that would be fragile, as the user could always not think
>> about it well and run a git fetch, breaking the objective that after the
>> first clone all commits were correctly signature-checked.
>>
>> Thanks for reading me!
>> Leo
>>
>> PS1: I am not subscribed to the ML.
>>
>> PS2: I've tried asking freenode#git, without success so far.
>>
>>
>> [1]
>> https://github.com/Ekleog/signed-git/blob/master/git-hooks/check-range-signed.sh
>>
>> [2] https://github.com/Ekleog/signed-git/blob/master/git-hooks/pre-push
> 
> There is no fetch hook, however you may find that the
> post-{checkout,merge} hooks are suitable for what you want to do.
> 
> Setting those to some custom comand is a common pattern for
> e.g. compiling some assets on "git pull", so you could similarly check
> the commits from HEAD, of course those are post-* hooks, so they won't
> stop the checkout.

Hmm, I don't think these would fit the bill. For post-merge, simply
because I spend my life rebasing stuff around, and very rarely merge.
For post-checkout, it could work, but then I'd need to keep track
manually of up to where the commits have been checked and to search the
git graph for the latest checked ancestor (as otherwise checking-out
another branch then checking-out the first branch again would likely
trigger a failure, due to the keyring being dynamic), so it would likely
be a dealbreaker, due to the hook becoming too complex to be trusted.

(Just in case you wonder, by “the keyring being dynamic” I mean the PGP
keys allowed to sign commits are stored directly inside the git repository)

That said, I just came upon [1] (esp. the description [2] and the patch
[3]), and wondered: it looks like the patch was abandoned midway in
favor of a hook refactoring. Would you happen to know whether the hook
refactoring eventually took place, and/or whether this patch was
resubmitted later, and/or whether it would still be possible to merge
this now? (not having any experience with git's internals yet, I don't
really know whether these are stupid questions or not)

Thanks!
Leo

PS: Cc'ing Joey, as you most likely know best what eventually happened,
if you can remember it?


[1] https://marc.info/?t=132477041500001&r=1&w=2

[2] https://marc.info/?l=git&m=132483581218382&w=2

[3] https://marc.info/?l=git&m=132486687023893&w=2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-08  0:06   ` Fetch-hooks Leo Gaspard
@ 2018-02-08 15:30     ` Joey Hess
  2018-02-08 17:02       ` Fetch-hooks Leo Gaspard
  0 siblings, 1 reply; 38+ messages in thread
From: Joey Hess @ 2018-02-08 15:30 UTC (permalink / raw)
  To: Leo Gaspard; +Cc: Ævar Arnfjörð Bjarmason, git

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

Leo Gaspard wrote:
> That said, I just came upon [1] (esp. the description [2] and the patch
> [3]), and wondered: it looks like the patch was abandoned midway in
> favor of a hook refactoring. Would you happen to know whether the hook
> refactoring eventually took place, and/or whether this patch was
> resubmitted later, and/or whether it would still be possible to merge
> this now? (not having any experience with git's internals yet, I don't
> really know whether these are stupid questions or not)
> 
> PS: Cc'ing Joey, as you most likely know best what eventually happened,
> if you can remember it?

I don't remember it well, but reviewing the thread, I think it foundered
on this comment by Junio:

> That use case sounds like that "git fetch" is called as a first class UI,
> which is covered by "git myfetch" (you can call it "git annex fetch")
> wrapper approach, the canonical example of a hook that we explicitly do
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> not want to add.
  ^^^^^^^^^^^^^^^

While I still think a fetch hook would be a good idea for reasons of
composability, I then just went off and implemented such a wrapper for
my own particular use case, and the wrapper program then grew to cover
use cases that a hook would not have been able to cover, so ...

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-08 15:30     ` Fetch-hooks Joey Hess
@ 2018-02-08 17:02       ` Leo Gaspard
  2018-02-08 21:06         ` Fetch-hooks Ævar Arnfjörð Bjarmason
  2018-02-09 19:12         ` Fetch-hooks Leo Gaspard
  0 siblings, 2 replies; 38+ messages in thread
From: Leo Gaspard @ 2018-02-08 17:02 UTC (permalink / raw)
  To: Joey Hess; +Cc: Ævar Arnfjörð Bjarmason, git

On 02/08/2018 04:30 PM, Joey Hess wrote:
> Leo Gaspard wrote:
>> That said, I just came upon [1] (esp. the description [2] and the patch
>> [3]), and wondered: it looks like the patch was abandoned midway in
>> favor of a hook refactoring. Would you happen to know whether the hook
>> refactoring eventually took place, and/or whether this patch was
>> resubmitted later, and/or whether it would still be possible to merge
>> this now? (not having any experience with git's internals yet, I don't
>> really know whether these are stupid questions or not)
>>
>> PS: Cc'ing Joey, as you most likely know best what eventually happened,
>> if you can remember it?
> 
> I don't remember it well, but reviewing the thread, I think it foundered
> on this comment by Junio:
> 
>> That use case sounds like that "git fetch" is called as a first class UI,
>> which is covered by "git myfetch" (you can call it "git annex fetch")
>> wrapper approach, the canonical example of a hook that we explicitly do
>                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> not want to add.
>   ^^^^^^^^^^^^^^^
> 
> While I still think a fetch hook would be a good idea for reasons of
> composability, I then just went off and implemented such a wrapper for
> my own particular use case, and the wrapper program then grew to cover
> use cases that a hook would not have been able to cover, so ...

Hmm, OK, so I guess I'll try to update the patch when I get some time to
delve into git's internals, as my use case (forbidding some fetches)
couldn't afaik be covered by a wrapper hook.

Thanks for the feedback!
Leo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-08 17:02       ` Fetch-hooks Leo Gaspard
@ 2018-02-08 21:06         ` Ævar Arnfjörð Bjarmason
  2018-02-08 22:18           ` Fetch-hooks Leo Gaspard
  2018-02-09 19:12         ` Fetch-hooks Leo Gaspard
  1 sibling, 1 reply; 38+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-08 21:06 UTC (permalink / raw)
  To: Leo Gaspard; +Cc: Joey Hess, git


On Thu, Feb 08 2018, Leo Gaspard jotted:

> On 02/08/2018 04:30 PM, Joey Hess wrote:
>> Leo Gaspard wrote:
>>> That said, I just came upon [1] (esp. the description [2] and the patch
>>> [3]), and wondered: it looks like the patch was abandoned midway in
>>> favor of a hook refactoring. Would you happen to know whether the hook
>>> refactoring eventually took place, and/or whether this patch was
>>> resubmitted later, and/or whether it would still be possible to merge
>>> this now? (not having any experience with git's internals yet, I don't
>>> really know whether these are stupid questions or not)
>>>
>>> PS: Cc'ing Joey, as you most likely know best what eventually happened,
>>> if you can remember it?
>>
>> I don't remember it well, but reviewing the thread, I think it foundered
>> on this comment by Junio:
>>
>>> That use case sounds like that "git fetch" is called as a first class UI,
>>> which is covered by "git myfetch" (you can call it "git annex fetch")
>>> wrapper approach, the canonical example of a hook that we explicitly do
>>                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> not want to add.
>>   ^^^^^^^^^^^^^^^
>>
>> While I still think a fetch hook would be a good idea for reasons of
>> composability, I then just went off and implemented such a wrapper for
>> my own particular use case, and the wrapper program then grew to cover
>> use cases that a hook would not have been able to cover, so ...
>
> Hmm, OK, so I guess I'll try to update the patch when I get some time to
> delve into git's internals, as my use case (forbidding some fetches)
> couldn't afaik be covered by a wrapper hook.

Per my reading of
https://public-inbox.org/git/20111224234212.GA21533@gnu.kitenet.net/
what Joey implemented is not what you described in your initial mail.

His is a *post*-fetch hook, we've already done the fetch and are just
telling you as a courtesy what refs changed. You could also implement
this as some cronjob that polls git for-each-ref but it's easier as a
hook, fine.

What you're describing is something like a pre-fetch hook analogous to
the pre-receive hooks, where you're fed refs updated on the remote on
stdin, and can say you don't want some of those to be updated.

This may just be a lack of imagination on my part, but I don't see how
that's sensible at all.

The refs we fetch are our *copy* of the remote refs, why would you not
want to track the upstream remote. You're going to refuse some branches
and what? Be further behind until some point in the future where the tip
is GPG-signed and you accept it, at which poich you'll need to do more
work than if you were up-to-date with the almost-GPG-signed version?

I think you're confusing two things here. One is the reasonable concern
of wanting to not have your local copy of remote refs have undesirable
content, but a pre-fetch hook is not the way to accomplish that.

The other is e.g. to ensure that you never locally check out some "bad"
ref, we don't have hook support for that, but could add it,
e.g. git-checkout and git reset --hard could be taught about some
pre-checkout hook.

You could also have some intermediate step between these two, where
e.g. your refspec for "origin" is
"+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default
"+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that
location, then you move them (with some alias/hook) to
"refs/remotes/origin/*" once they're seen to be "OK".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-08 21:06         ` Fetch-hooks Ævar Arnfjörð Bjarmason
@ 2018-02-08 22:18           ` Leo Gaspard
  2018-02-09 22:04             ` Fetch-hooks Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-08 22:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Joey Hess, git

On 02/08/2018 10:06 PM, Ævar Arnfjörð Bjarmason wrote:>> Hmm, OK, so I
guess I'll try to update the patch when I get some time to
>> delve into git's internals, as my use case (forbidding some fetches)
>> couldn't afaik be covered by a wrapper hook.
> 
> Per my reading of
> https://public-inbox.org/git/20111224234212.GA21533@gnu.kitenet.net/
> what Joey implemented is not what you described in your initial mail.
> 
> His is a *post*-fetch hook, we've already done the fetch and are just
> telling you as a courtesy what refs changed. You could also implement
> this as some cronjob that polls git for-each-ref but it's easier as a
> hook, fine.

I was thinking along the lines of
    https://marc.info/?l=git&m=132486687023893&w=2
with high-level description at
    https://marc.info/?l=git&m=132480559712592&w=2

With the high-level description given here, I'm pretty sure I can hack a
hook together to make things work as I want them to.

> What you're describing is something like a pre-fetch hook analogous to
> the pre-receive hooks, where you're fed refs updated on the remote on
> stdin, and can say you don't want some of those to be updated.
> 
> This may just be a lack of imagination on my part, but I don't see how
> that's sensible at all.
> 
> The refs we fetch are our *copy* of the remote refs, why would you not
> want to track the upstream remote. You're going to refuse some branches
> and what? Be further behind until some point in the future where the tip
> is GPG-signed and you accept it, at which poich you'll need to do more
> work than if you were up-to-date with the almost-GPG-signed version?

That's about it. I want all fetching to be blocked in case of the tip
not being signed. As there is a pre-push hook ensuring committers don't
forget to sign before pushing, the only case the tip could not be signed
is in case of an attack, which means it's better to just force-push
master because any git repo that fetched it is doomed anyway. Definitely
would not want to allow an untrusted revision get into anything that
could even remotely be taken as “endorsed” by the user.

(BTW, in order to avoid the case of someone forgetting to sign the
commit and not having installed the pre-push hook, there can be holes in
the commit-signing chain, the drawback being that the committer pushing
a signed commit takes responsibility for all unsigned commits directly
preceding his -- allowing them to recover in case of a mistaken push)

> I think you're confusing two things here. One is the reasonable concern
> of wanting to not have your local copy of remote refs have undesirable
> content, but a pre-fetch hook is not the way to accomplish that.

Well, a pre-fetch hook is a possible way of accomplishing that, and I
don't know of any better one?

> The other is e.g. to ensure that you never locally check out some "bad"
> ref, we don't have hook support for that, but could add it,
> e.g. git-checkout and git reset --hard could be taught about some
> pre-checkout hook.

Issue is, once we have to fix checkout and reset, all other commands
that potentially touch the worktree also have to be fixed (eg. I don't
know whether worktree add triggers pre-checkout?)

Also, this requires the hook to store a database of all the paths that
have been checked, because there is no logic in how one may choose to
checkout the repo. While having a tweak-fetch hook would make the
implementation straightforward, because at the time of invoking the hook
the “refname at remote” commit is already trusted, and the “object name”
is the commit whose validity we want to check, so we just have to check
the path between those two. (I don't know if you checked my current
scripts, but basically as the set of allowed PGP keys can change at any
commit, it's only possible to check a commit path, not a single commit
out-of-nowhere)

The only issue that could arise with a tweak-fetch hook is in case of a
force-fetch (and even maybe it's not even an actual issue, I haven't
given it real thought yet), but this can reasonably be banned, as once a
commit is signed it enters the “real” master branch, that should never
be moved backward, as it can't be the sign of an attack.

> You could also have some intermediate step between these two, where
> e.g. your refspec for "origin" is
> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default
> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that
> location, then you move them (with some alias/hook) to
> "refs/remotes/origin/*" once they're seen to be "OK".

That is indeed another possibility, but then the idea is to make things
as transparent as possible for the end-user, not to completely change
their git workflow. As such, considering only signed commits to be part
of the upstream seems to make sense to me?

Cheers,
Leo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-08 17:02       ` Fetch-hooks Leo Gaspard
  2018-02-08 21:06         ` Fetch-hooks Ævar Arnfjörð Bjarmason
@ 2018-02-09 19:12         ` Leo Gaspard
  2018-02-09 20:20           ` Fetch-hooks Joey Hess
  1 sibling, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-09 19:12 UTC (permalink / raw)
  To: Joey Hess; +Cc: Ævar Arnfjörð Bjarmason, git

On 02/08/2018 06:02 PM, Leo Gaspard wrote:
> On 02/08/2018 04:30 PM, Joey Hess wrote:
>> [...]
> 
> Hmm, OK, so I guess I'll try to update the patch when I get some time to
> delve into git's internals, as my use case (forbidding some fetches)
> couldn't afaik be covered by a wrapper hook.

Joey,

I just wanted to check, you did not put the Signed-off-by line in
patches in https://marc.info/?l=git&m=132491485901482&w=2

Could you confirm that the patches you sent are “covered under an
appropriate open source license and I have the right under that license
to submit that work with modifications, whether created in whole or in
part by me, under the same open source license (unless I am permitted to
submit under a different license), as indicated in the file”, so that I
could send the patch I wrote based on yours with a Signed-off-by line of
my own without breaking the DCO?

Thanks!
Leo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-09 19:12         ` Fetch-hooks Leo Gaspard
@ 2018-02-09 20:20           ` Joey Hess
  2018-02-09 21:28             ` [PATCH 0/2] fetch: add tweak-fetch hook Leo Gaspard
  0 siblings, 1 reply; 38+ messages in thread
From: Joey Hess @ 2018-02-09 20:20 UTC (permalink / raw)
  To: Leo Gaspard; +Cc: Ævar Arnfjörð Bjarmason, git

[-- Attachment #1: Type: text/plain, Size: 722 bytes --]

Leo Gaspard wrote:
> I just wanted to check, you did not put the Signed-off-by line in
> patches in https://marc.info/?l=git&m=132491485901482&w=2
> 
> Could you confirm that the patches you sent are “covered under an
> appropriate open source license and I have the right under that license
> to submit that work with modifications, whether created in whole or in
> part by me, under the same open source license (unless I am permitted to
> submit under a different license), as indicated in the file”, so that I
> could send the patch I wrote based on yours with a Signed-off-by line of
> my own without breaking the DCO?

Yes; my patches are under the same GPL-2 as the rest of git.

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 0/2] fetch: add tweak-fetch hook
  2018-02-09 20:20           ` Fetch-hooks Joey Hess
@ 2018-02-09 21:28             ` Leo Gaspard
  2018-02-09 21:44               ` [PATCH 1/2] fetch: preparations for " Leo Gaspard
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-09 21:28 UTC (permalink / raw)
  To: git
  Cc: Joey Hess, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Johannes Sixt

On 02/09/2018 09:20 PM, Joey Hess wrote:> Yes; my patches are under the
same GPL-2 as the rest of git.

Thanks! So here comes my patch series, heavily based on yours.

There are some things to bear in mind while reviewing it:

 * This is my first real attempt at contributing to git, which means I
could be very very far off-track

 * Most of it is based on trying to make the 6-year-old patch series
work and pass all the tests, so if a new feature has been added since
then I likely didn't notice it or don't know how to handle it correctly

There are still three TODO's in the code:

 * In the documentation, one stating that I don't really get what this
“ignore” parameter exactly does, and whether it should be handled
specially (a prime example of a new feature I'm not really sure how to
handle, somewhere in the code it's written all the “ignore” references
are at the end of the list, but I'm already not self-confident enough
about the difference between “merge” and “not-for-merge” to even
consider making a good choice about how to handle “ignore”)

 * In `builtins/fetch.c`, function `do_fetch`, there is a conflict of
interest between placing the `prune` before the `fetch` (as done by
commit 10a6cc889 ("fetch --prune: Run prune before fetching",
2014-01-02)), and placing the `fetch` before the `prune` (which would
allow hooks that rename the local-ref to not be prune'd and then
re-fetched when doing a `git fetch --prune` -- without that a hook that
would want to both read the old commit information and rename the
local-ref would not be able to). Or maybe this question means actually
there should be a third solution? but I don't really know what. Maybe
also hooking into the prune operation?

 * In `templates/hooks--tweak-fetch.sample`, the “check this actually
works” todo, as I'd rather first check this patch series is not too far
off-topic before doing non-essential work -- anyway another version of
the patch series will be required for the other two TODO's, so I can fix
it at this point.

That being said, what do you think about these patches?

Thanks for your time!
Leo Gaspard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/2] fetch: preparations for tweak-fetch hook
  2018-02-09 21:28             ` [PATCH 0/2] fetch: add tweak-fetch hook Leo Gaspard
@ 2018-02-09 21:44               ` " Leo Gaspard
  2018-02-09 21:44                 ` [PATCH 2/2] fetch: add " Leo Gaspard
  2018-02-09 22:34                 ` [PATCH 1/2] fetch: preparations for " Junio C Hamano
  0 siblings, 2 replies; 38+ messages in thread
From: Leo Gaspard @ 2018-02-09 21:44 UTC (permalink / raw)
  To: git
  Cc: Joey Hess, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Johannes Sixt, Léo Gaspard

From: Léo Gaspard <leo@gaspard.io>

No behavior changes yet, only some groundwork for the next change.

The refs_result structure combines a status code with a ref map, which
can be NULL even on success. This will be needed when there's a
tweak-fetch hook, because it can filter out all refs, while still
succeeding.

fetch_refs returns a refs_result, so that it can modify the ref_map.

Based-on-patch-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Leo Gaspard <leo@gaspard.io>
---
 builtin/fetch.c | 68 +++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7bbcd26fa..76dc05f61 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -34,6 +34,11 @@ enum {
 	TAGS_SET = 2
 };
 
+struct refs_result {
+	struct ref *new_refs;
+	int status;
+};
+
 static int fetch_prune_config = -1; /* unspecified */
 static int prune = -1; /* unspecified */
 #define PRUNE_BY_DEFAULT 0 /* do we prune by default? */
@@ -57,6 +62,18 @@ static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
 
+static int add_existing(const char *refname, const struct object_id *oid,
+			int flag, void *cbdata)
+{
+	struct string_list *list = (struct string_list *)cbdata;
+	struct string_list_item *item = string_list_insert(list, refname);
+	struct object_id *old_oid = xmalloc(sizeof(*old_oid));
+
+	oidcpy(old_oid, oid);
+	item->util = old_oid;
+	return 0;
+}
+
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
 	if (!strcmp(k, "fetch.prune")) {
@@ -217,18 +234,6 @@ static void add_merge_config(struct ref **head,
 	}
 }
 
-static int add_existing(const char *refname, const struct object_id *oid,
-			int flag, void *cbdata)
-{
-	struct string_list *list = (struct string_list *)cbdata;
-	struct string_list_item *item = string_list_insert(list, refname);
-	struct object_id *old_oid = xmalloc(sizeof(*old_oid));
-
-	oidcpy(old_oid, oid);
-	item->util = old_oid;
-	return 0;
-}
-
 static int will_fetch(struct ref **head, const unsigned char *sha1)
 {
 	struct ref *rm = *head;
@@ -920,15 +925,20 @@ static int quickfetch(struct ref *ref_map)
 	return check_connected(iterate_ref_map, &rm, &opt);
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+static struct refs_result fetch_refs(struct transport *transport,
+		struct ref *ref_map)
 {
-	int ret = quickfetch(ref_map);
-	if (ret)
-		ret = transport_fetch_refs(transport, ref_map);
-	if (!ret)
-		ret |= store_updated_refs(transport->url,
+	struct refs_result ret;
+	ret.status = quickfetch(ref_map);
+	if (ret.status) {
+		ret.status = transport_fetch_refs(transport, ref_map);
+	}
+	if (!ret.status) {
+		ret.new_refs = ref_map;
+		ret.status |= store_updated_refs(transport->url,
 				transport->remote->name,
-				ref_map);
+				ret.new_refs);
+	}
 	transport_unlock_pack(transport);
 	return ret;
 }
@@ -1048,9 +1058,11 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 	return transport;
 }
 
-static void backfill_tags(struct transport *transport, struct ref *ref_map)
+static struct refs_result backfill_tags(struct transport *transport,
+		struct ref *ref_map)
 {
 	int cannot_reuse;
+	struct refs_result res;
 
 	/*
 	 * Once we have set TRANS_OPT_DEEPEN_SINCE, we can't unset it
@@ -1069,12 +1081,14 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map)
 	transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
 	transport_set_option(transport, TRANS_OPT_DEPTH, "0");
 	transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-	fetch_refs(transport, ref_map);
+	res = fetch_refs(transport, ref_map);
 
 	if (gsecondary) {
 		transport_disconnect(gsecondary);
 		gsecondary = NULL;
 	}
+
+	return res;
 }
 
 static int do_fetch(struct transport *transport,
@@ -1083,6 +1097,7 @@ static int do_fetch(struct transport *transport,
 	struct string_list existing_refs = STRING_LIST_INIT_DUP;
 	struct ref *ref_map;
 	struct ref *rm;
+	struct refs_result res;
 	int autotags = (transport->remote->fetch_tags == 1);
 	int retcode = 0;
 
@@ -1135,7 +1150,10 @@ static int do_fetch(struct transport *transport,
 				   transport->url);
 		}
 	}
-	if (fetch_refs(transport, ref_map)) {
+
+	res = fetch_refs(transport, ref_map);
+	ref_map = res.new_refs;
+	if (res.status) {
 		free_refs(ref_map);
 		retcode = 1;
 		goto cleanup;
@@ -1148,8 +1166,10 @@ static int do_fetch(struct transport *transport,
 		struct ref **tail = &ref_map;
 		ref_map = NULL;
 		find_non_local_tags(transport, &ref_map, &tail);
-		if (ref_map)
-			backfill_tags(transport, ref_map);
+		if (ref_map) {
+			res = backfill_tags(transport, ref_map);
+			ref_map = res.new_refs;
+		}
 		free_refs(ref_map);
 	}
 
-- 
2.16.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/2] fetch: add tweak-fetch hook
  2018-02-09 21:44               ` [PATCH 1/2] fetch: preparations for " Leo Gaspard
@ 2018-02-09 21:44                 ` " Leo Gaspard
  2018-02-09 22:40                   ` Junio C Hamano
  2018-02-09 22:34                 ` [PATCH 1/2] fetch: preparations for " Junio C Hamano
  1 sibling, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-09 21:44 UTC (permalink / raw)
  To: git
  Cc: Joey Hess, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Johannes Sixt, Léo Gaspard

From: Léo Gaspard <leo@gaspard.io>

The tweak-fetch hook is fed lines on stdin for all refs that were
fetched, and outputs on stdout possibly modified lines. Its output is
then parsed and used when `git fetch` updates the remote tracking refs,
records the entries in FETCH_HEAD, and produces its report.

The modifications here are heavily based on prior work by Joey Hess.

Based-on-patch-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Leo Gaspard <leo@gaspard.io>
---
 Documentation/githooks.txt          |  37 +++++++
 builtin/fetch.c                     | 210 +++++++++++++++++++++++++++++++++++-
 t/t5574-fetch-tweak-fetch-hook.sh   |  90 ++++++++++++++++
 templates/hooks--tweak-fetch.sample |  24 +++++
 4 files changed, 359 insertions(+), 2 deletions(-)
 create mode 100755 t/t5574-fetch-tweak-fetch-hook.sh
 create mode 100755 templates/hooks--tweak-fetch.sample

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index f877f7b7c..1b4a18bf0 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -177,6 +177,43 @@ This hook can be used to perform repository validity checks, auto-display
 differences from the previous HEAD if different, or set working dir metadata
 properties.
 
+tweak-fetch
+~~~~~~~~~~~
+
+This hook is invoked by 'git fetch' (commonly called by 'git pull'), after refs
+have been fetched from the remote repository. It is not executed, if nothing was
+fetched.
+
+The output of the hook is used to update the remote-tracking branches, and
+`.git/FETCH_HEAD`, in preparation for a later merge operation done by 'git
+merge'.
+
+It takes no arguments, but is fed a line of the following format on its standard
+input for each ref that was fetched.
+
+  <sha1> SP not-for-merge|merge|ignore SP <remote-refname> SP <local-refname> LF
+
+Where the "not-for-merge" flag indicates the ref is not to be merged into the
+current branch, and the "merge" flag indicates that 'git merge' should later
+merge it.
+
+The `<remote-refname>` is the remote's name for the ref that was fetched, and
+`<local-refname>` is a name of a remote-tracking branch, like
+"refs/remotes/origin/master". `<local-refname>` can be undefined if the fetched
+ref is not being stored in a local refname. In this case, it will be set to `@`,
+an invalide refspec, so that scripts can be written more easily.
+
+TODO: Add documentation for the “ignore” parameter. Unfortunately, I'm not
+really sure I get what this does or what invariants it is supposed to maintain
+(eg. all “ignore” updates at the end of the refs list?), so this may also
+require code changes.
+
+The hook must consume all of its standard input, and output back lines of the
+same format. It can modify its input as desired, including adding or removing
+lines, updating the sha1 (i.e. re-point the remote-tracking branch), changing
+the merge flag, and changing the `<local-refname>` (i.e. use different
+remote-tracking branch).
+
 post-merge
 ~~~~~~~~~~
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 76dc05f61..1bb394530 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -28,6 +28,8 @@ static const char * const builtin_fetch_usage[] = {
 	NULL
 };
 
+static const char tweak_fetch_hook[] = "tweak-fetch";
+
 enum {
 	TAGS_UNSET = 0,
 	TAGS_DEFAULT = 1,
@@ -181,6 +183,206 @@ static struct option builtin_fetch_options[] = {
 	OPT_END()
 };
 
+static int feed_tweak_fetch_hook(int in, int out, void *data)
+{
+	struct ref *ref;
+	struct strbuf buf = STRBUF_INIT;
+	const char *kw, *peer_ref;
+	char oid_buf[GIT_SHA1_HEXSZ + 1];
+	int ret;
+
+	for (ref = data; ref; ref = ref->next) {
+		if (ref->fetch_head_status == FETCH_HEAD_MERGE)
+			kw = "merge";
+		else if (ref->fetch_head_status == FETCH_HEAD_IGNORE)
+			kw = "ignore";
+		else
+			kw = "not-for-merge";
+		if (!ref->name)
+			die("trying to fetch an inexistant ref");
+		if (ref->peer_ref && ref->peer_ref->name)
+			peer_ref = ref->peer_ref->name;
+		else
+			peer_ref = "@";
+		strbuf_addf(&buf, "%s %s %s %s\n",
+				oid_to_hex_r(oid_buf, &ref->old_oid), kw,
+				ref->name, peer_ref);
+	}
+
+	ret = write_in_full(out, buf.buf, buf.len) != buf.len;
+	if (ret)
+		warning("%s hook failed to consume all its input",
+				tweak_fetch_hook);
+	close(out);
+	strbuf_release(&buf);
+	return ret;
+}
+
+static struct ref *parse_tweak_fetch_hook_line(char *l,
+		struct string_list *existing_refs)
+{
+	struct ref *ref = NULL, *peer_ref = NULL;
+	struct string_list_item *peer_item = NULL;
+	char *words[4];
+	int i, word = 0;
+	char *problem;
+
+	for (i = 0; l[i]; i++) {
+		if (isspace(l[i])) {
+			l[i] = '\0';
+			words[word] = l;
+			l += i + 1;
+			i = 0;
+			word++;
+			if (word > 3) {
+				problem = "too many words";
+				goto unparsable;
+			}
+		}
+	}
+	if (word < 3) {
+		problem = "not enough words";
+		goto unparsable;
+	}
+
+	ref = alloc_ref(words[2]);
+	peer_ref = ref->peer_ref = alloc_ref(l);
+	ref->peer_ref->force = 1;
+
+	if (get_oid_hex(words[0], &ref->old_oid)) {
+		problem="bad oid";
+		goto unparsable;
+	}
+
+	if (!strcmp(words[1], "merge")) {
+		ref->fetch_head_status = FETCH_HEAD_MERGE;
+	} else if (!strcmp(words[1], "ignore")) {
+		ref->fetch_head_status = FETCH_HEAD_IGNORE;
+	} else if (!strcmp(words[1], "not-for-merge")) {
+		ref->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
+	} else {
+		problem = "bad merge flag";
+		goto unparsable;
+	}
+
+	peer_item = string_list_lookup(existing_refs, peer_ref->name);
+	if (peer_item)
+		hashcpy(peer_ref->old_oid.hash, peer_item->util);
+
+	return ref;
+
+unparsable:
+	warning("%s hook output a wrongly formed line: %s",
+			tweak_fetch_hook, problem);
+	free(ref);
+	free(peer_ref);
+	return NULL;
+}
+
+static struct refs_result read_tweak_fetch_hook(int in)
+{
+	struct refs_result res;
+	FILE *f;
+	struct strbuf buf;
+	struct string_list existing_refs = STRING_LIST_INIT_DUP;
+	struct ref *ref, *prevref = NULL;
+
+	res.status = 0;
+	res.new_refs = NULL;
+
+	f = fdopen(in, "r");
+	if (f == NULL) {
+		res.status = 1;
+		return res;
+	}
+
+	strbuf_init(&buf, 128);
+	for_each_ref(add_existing, &existing_refs);
+
+	while (strbuf_getline(&buf, f) != EOF) {
+		char *l = strbuf_detach(&buf, NULL);
+		ref = parse_tweak_fetch_hook_line(l, &existing_refs);
+		if (!ref) {
+			res.status = 1;
+		} else {
+			if (prevref) {
+				prevref->next = ref;
+				prevref = ref;
+			} else {
+				res.new_refs = prevref = ref;
+			}
+		}
+		free(l);
+	}
+
+	string_list_clear(&existing_refs, 0);
+	strbuf_release(&buf);
+	fclose(f);
+	return res;
+}
+
+/*
+ * The hook is fed lines of the form:
+ * <sha1> SP <not-for-merge|merge|ignore> SP <remote-refname> SP <local-refname> LF
+ * And should output rewritten lines of the same form.
+ */
+static struct ref *run_tweak_fetch_hook(struct ref *fetched_refs)
+{
+	struct child_process hook;
+	const char *argv[2];
+	struct async async;
+	struct refs_result res;
+
+	if (!fetched_refs)
+		return fetched_refs;
+
+	argv[0] = find_hook(tweak_fetch_hook);
+	if (access(argv[0], X_OK) < 0)
+		return fetched_refs;
+	argv[1] = NULL;
+
+	memset(&hook, 0, sizeof(hook));
+	hook.argv = argv;
+	hook.in = -1;
+	hook.out = -1;
+	if (start_command(&hook))
+		return fetched_refs;
+
+	/*
+	 * Use an async writer to feed the hook process.
+	 * This allows the hook to read and write a line at
+	 * a time without blocking.
+	 */
+	memset(&async, 0, sizeof(async));
+	async.proc = feed_tweak_fetch_hook;
+	async.data = fetched_refs;
+	async.out = hook.in;
+	if (start_async(&async)) {
+		close(hook.in);
+		close(hook.out);
+		finish_command(&hook);
+		return fetched_refs;
+	}
+
+	res = read_tweak_fetch_hook(hook.out);
+	res.status |= finish_async(&async);
+	res.status |= finish_command(&hook);
+
+	if (res.status) {
+		warning("%s hook failed, ignoring its output", tweak_fetch_hook);
+		free(res.new_refs);
+		return fetched_refs;
+	} else {
+		/*
+		 * The new_refs are returned, to be used in place of
+		 * fetched_refs, so it is not needed anymore and can
+		 * be freed here.
+		 */
+		free_refs(fetched_refs);
+		return res.new_refs;
+	}
+}
+
 static void unlock_pack(void)
 {
 	if (gtransport)
@@ -934,7 +1136,7 @@ static struct refs_result fetch_refs(struct transport *transport,
 		ret.status = transport_fetch_refs(transport, ref_map);
 	}
 	if (!ret.status) {
-		ret.new_refs = ref_map;
+		ret.new_refs = run_tweak_fetch_hook(ref_map);
 		ret.status |= store_updated_refs(transport->url,
 				transport->remote->name,
 				ret.new_refs);
@@ -1150,7 +1352,11 @@ static int do_fetch(struct transport *transport,
 				   transport->url);
 		}
 	}
-
+	// TODO(?): Were this placed above the `if (prune)`, it would avoid the
+	// unfortunate fact that `git fetch --prune` first drops the ref then
+	// re-adds it (in cases where the tweak-fetch hook renames it). There is
+	// likely a better solution than this one that would break Commit
+	// 10a6cc889 ("fetch --prune: Run prune before fetching", 2014-01-02)
 	res = fetch_refs(transport, ref_map);
 	ref_map = res.new_refs;
 	if (res.status) {
diff --git a/t/t5574-fetch-tweak-fetch-hook.sh b/t/t5574-fetch-tweak-fetch-hook.sh
new file mode 100755
index 000000000..17cf52684
--- /dev/null
+++ b/t/t5574-fetch-tweak-fetch-hook.sh
@@ -0,0 +1,90 @@
+#!/bin/sh
+
+test_description='testing tweak-fetch-hook'
+. ./test-lib.sh
+
+HOOKDIR="$(git rev-parse --git-dir)/hooks"
+HOOK="$HOOKDIR/tweak-fetch"
+mkdir -p "$HOOKDIR"
+
+# Setup
+test_expect_success 'setup' '
+	git init parent-repo &&
+	git remote add parent parent-repo &&
+	(cd parent-repo && test_commit commit-100) &&
+	git fetch parent &&
+	git tag | grep -E "^commit-100$"
+'
+
+# No-effect hook
+write_script "$HOOK" <<EOF
+cat
+EOF
+test_expect_success 'no-op hook' '
+	(cd parent-repo && test_commit commit-200) &&
+	git fetch parent &&
+	git tag | grep -E "^commit-200$"
+'
+
+# Ref-renaming hook
+write_script "$HOOK" <<EOF
+sed 's/commit-/tag-/g'
+EOF
+test_expect_success 'ref-renaming hook' '
+	(cd parent-repo && test_commit commit-300) &&
+	git fetch parent &&
+	git tag | grep -E "^tag-300" &&
+	! git tag | grep -E "^commit-300"
+'
+
+# Drop branch
+write_script "$HOOK" <<EOF
+cat
+EOF
+test_expect_success 'dropping hook setup' '
+	(cd parent-repo && test_commit commit-400) &&
+	git fetch parent &&
+	test "$(git rev-parse parent/master)" = "$(git rev-parse commit-400)"
+'
+write_script "$HOOK" <<EOF
+grep -v 'refs/remotes/parent/master'
+exit 0
+EOF
+test_expect_success 'dropping hook' '
+	(cd parent-repo && test_commit commit-401) &&
+	git fetch parent &&
+	test "$(git rev-parse parent/master)" = "$(git rev-parse commit-400)" &&
+	chmod -x "'"$HOOK"'" &&
+	git fetch parent &&
+	test "$(git rev-parse parent/master)" = "$(git rev-parse commit-401)"
+'
+
+# Repointing hook
+write_script "$HOOK" <<EOF
+cat
+EOF
+test_expect_success 'repointing hook setup' '
+	(cd parent-repo && test_commit commit-500) &&
+	git fetch parent
+'
+write_script "$HOOK" <<'EOF'
+while read hash merge remote_ref local_ref; do
+	if [ "$local_ref" = "refs/remotes/parent/master" ]; then
+		repointed="$(git rev-parse "$hash^")"
+		echo "$repointed $merge $remote_ref $local_ref"
+	else
+		echo "$hash $merge $remote_ref $local_ref"
+	fi
+done
+exit 0
+EOF
+test_expect_success 'repointing hook' '
+	(cd parent-repo && test_commit commit-501 && test_commit commit-502) &&
+	git fetch parent &&
+	test "$(git rev-parse parent/master)" = "$(git rev-parse commit-501)" &&
+	(cd parent-repo && test_commit commit-503) &&
+	git fetch parent &&
+	test "$(git rev-parse parent/master)" = "$(git rev-parse commit-502)"
+'
+
+test_done
diff --git a/templates/hooks--tweak-fetch.sample b/templates/hooks--tweak-fetch.sample
new file mode 100755
index 000000000..93b86ad2f
--- /dev/null
+++ b/templates/hooks--tweak-fetch.sample
@@ -0,0 +1,24 @@
+#!/bin/sh
+#
+# Copyright (c) 2018 Leo Gaspard
+#
+# The "tweak-fetch" hook is run during the fetching process. It is called with
+# no parameters. Its communication protocol is reading fetched references on
+# stdin, and outputting references to update on stdout, with the same protocol
+# described in `git help hooks`.
+#
+# This sample shows how to refuse fetching any unsigned commit.
+
+while read hash merge remote_ref local_ref; do
+    allowed_commit="$(git rev-parse "$local_ref")"
+    git rev-list "$local_ref..$hash" | tac | while read commit; do
+        if git verify-commit "$commit" > /dev/null 2>&1; then
+            allowed_commit="$commit"
+        else
+            echo "Commit '$commit' is not signed! Refusing to fetch past it" >&2
+            break
+        fi
+    done
+    echo "$allowed_commit $merge $remote_ref $local_ref"
+done
+# TODO: actually verify this hook works
-- 
2.16.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-08 22:18           ` Fetch-hooks Leo Gaspard
@ 2018-02-09 22:04             ` Ævar Arnfjörð Bjarmason
  2018-02-09 22:24               ` Fetch-hooks Leo Gaspard
  2018-02-09 22:30               ` Fetch-hooks Jeff King
  0 siblings, 2 replies; 38+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-09 22:04 UTC (permalink / raw)
  To: Leo Gaspard; +Cc: Joey Hess, git, Jeff King, Brandon Williams


On Thu, Feb 08 2018, Leo Gaspard jotted:

> On 02/08/2018 10:06 PM, Ævar Arnfjörð Bjarmason wrote:>> Hmm, OK, so I
> guess I'll try to update the patch when I get some time to
>>> delve into git's internals, as my use case (forbidding some fetches)
>>> couldn't afaik be covered by a wrapper hook.
>>
>> Per my reading of
>> https://public-inbox.org/git/20111224234212.GA21533@gnu.kitenet.net/
>> what Joey implemented is not what you described in your initial mail.
>>
>> His is a *post*-fetch hook, we've already done the fetch and are just
>> telling you as a courtesy what refs changed. You could also implement
>> this as some cronjob that polls git for-each-ref but it's easier as a
>> hook, fine.
>
> I was thinking along the lines of
>     https://marc.info/?l=git&m=132486687023893&w=2
> with high-level description at
>     https://marc.info/?l=git&m=132480559712592&w=2
>
> With the high-level description given here, I'm pretty sure I can hack a
> hook together to make things work as I want them to.
>
>> What you're describing is something like a pre-fetch hook analogous to
>> the pre-receive hooks, where you're fed refs updated on the remote on
>> stdin, and can say you don't want some of those to be updated.
>>
>> This may just be a lack of imagination on my part, but I don't see how
>> that's sensible at all.
>>
>> The refs we fetch are our *copy* of the remote refs, why would you not
>> want to track the upstream remote. You're going to refuse some branches
>> and what? Be further behind until some point in the future where the tip
>> is GPG-signed and you accept it, at which poich you'll need to do more
>> work than if you were up-to-date with the almost-GPG-signed version?
>
> That's about it. I want all fetching to be blocked in case of the tip
> not being signed. As there is a pre-push hook ensuring committers don't
> forget to sign before pushing, the only case the tip could not be signed
> is in case of an attack, which means it's better to just force-push
> master because any git repo that fetched it is doomed anyway. Definitely
> would not want to allow an untrusted revision get into anything that
> could even remotely be taken as “endorsed” by the user.
>
> (BTW, in order to avoid the case of someone forgetting to sign the
> commit and not having installed the pre-push hook, there can be holes in
> the commit-signing chain, the drawback being that the committer pushing
> a signed commit takes responsibility for all unsigned commits directly
> preceding his -- allowing them to recover in case of a mistaken push)
>
>> I think you're confusing two things here. One is the reasonable concern
>> of wanting to not have your local copy of remote refs have undesirable
>> content, but a pre-fetch hook is not the way to accomplish that.
>
> Well, a pre-fetch hook is a possible way of accomplishing that, and I
> don't know of any better one?
>
>> The other is e.g. to ensure that you never locally check out some "bad"
>> ref, we don't have hook support for that, but could add it,
>> e.g. git-checkout and git reset --hard could be taught about some
>> pre-checkout hook.
>
> Issue is, once we have to fix checkout and reset, all other commands
> that potentially touch the worktree also have to be fixed (eg. I don't
> know whether worktree add triggers pre-checkout?)
>
> Also, this requires the hook to store a database of all the paths that
> have been checked, because there is no logic in how one may choose to
> checkout the repo. While having a tweak-fetch hook would make the
> implementation straightforward, because at the time of invoking the hook
> the “refname at remote” commit is already trusted, and the “object name”
> is the commit whose validity we want to check, so we just have to check
> the path between those two. (I don't know if you checked my current
> scripts, but basically as the set of allowed PGP keys can change at any
> commit, it's only possible to check a commit path, not a single commit
> out-of-nowhere)

Right, it's certainly the case that refusing the refs is a more
effective gatekeeper, we're not going to have hooks for all the
low-level stuff that might inspect sha1s or check them out.

My assumption was that hooking into just a couple of things would be
good enough, but yes, there's other trade-offs.

> The only issue that could arise with a tweak-fetch hook is in case of a
> force-fetch (and even maybe it's not even an actual issue, I haven't
> given it real thought yet), but this can reasonably be banned, as once a
> commit is signed it enters the “real” master branch, that should never
> be moved backward, as it can't be the sign of an attack.
>
>> You could also have some intermediate step between these two, where
>> e.g. your refspec for "origin" is
>> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default
>> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that
>> location, then you move them (with some alias/hook) to
>> "refs/remotes/origin/*" once they're seen to be "OK".
>
> That is indeed another possibility, but then the idea is to make things
> as transparent as possible for the end-user, not to completely change
> their git workflow. As such, considering only signed commits to be part
> of the upstream seems to make sense to me?

I mean this would be something that would be part of a post-fetch hook,
so it would be as transparent as what you're doing to the user, with the
difference that it doesn't need to involve changes to what you slurp
down from the server.

I.e. we'd just fetch into refs/remotes/origin-untrusted/, then we run
your post-fetch hook and you go over the new refs, and copy what you
like (usually everything) to refs/remotes/origin/*.

You get the same thing, the logic just becomes inspecting something
locally and copying it over.

There are other caveats, notably that the local ref store now needs to
store 2x the amount of branches, unless we're smarter about it and only
store stuff in refs/remotes/origin-untrusted/ that's different from
refs/remotes/origin/, as a sort of ref staging area.

Anyway, I don't have strong opinions on this, just thought it was worth
discussing the whys.

One thing that's not discussed yet, and I know just enough about for it
to tingle my spidey sense, but not enough to say for sure (CC'd Jeff &
Brandon who know more) is that this feature once shipped might cause
higher load on git hosting providers.

This is because people will inevitably use it in popular projects for
some custom filtering, and because you're continually re-fetching and
inspecting stuff what used to be a really cheap no-op "pull" most of the
time is a more expensive negotiation every time before the client
rejects the refs again, and worse for hosting providers because you have
bespoke ref fetching strategies you have less odds of being able to
cache both the negotiation and the pack you serve.

I.e. you want this for some security feature where 99.99% of the time
you accept all refs, but most people will probably use this to implement
dynamic Turing-complete refspecs.

Maybe that's worrying about nothing, but worth thinking about.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-09 22:04             ` Fetch-hooks Ævar Arnfjörð Bjarmason
@ 2018-02-09 22:24               ` Leo Gaspard
  2018-02-09 22:56                 ` Fetch-hooks Ævar Arnfjörð Bjarmason
  2018-02-09 22:30               ` Fetch-hooks Jeff King
  1 sibling, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-09 22:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Joey Hess, git, Jeff King, Brandon Williams

On 02/09/2018 11:04 PM, Ævar Arnfjörð Bjarmason wrote:>>> You could also
have some intermediate step between these two, where
>>> e.g. your refspec for "origin" is
>>> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default
>>> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that
>>> location, then you move them (with some alias/hook) to
>>> "refs/remotes/origin/*" once they're seen to be "OK".
>>
>> That is indeed another possibility, but then the idea is to make things
>> as transparent as possible for the end-user, not to completely change
>> their git workflow. As such, considering only signed commits to be part
>> of the upstream seems to make sense to me?
> 
> I mean this would be something that would be part of a post-fetch hook,
> so it would be as transparent as what you're doing to the user, with the
> difference that it doesn't need to involve changes to what you slurp
> down from the server.
> 
> I.e. we'd just fetch into refs/remotes/origin-untrusted/, then we run
> your post-fetch hook and you go over the new refs, and copy what you
> like (usually everything) to refs/remotes/origin/*.

Hmm... but that would then require a post-fetch hook, wouldn't it? And
about a post-fetch hook, if I understood correctly, Junio in [1] had a
quite nice argument against it:

    Although I do not deeply care between such a "trigger to only
    notify, no touching" hook and a full-blown "allow hook writers to
    easily lie about what happened in the fetch" hook, I was hoping that
    we would get this right and useful if we were spending our brain
    bandwidth on it. I am not very fond of an easier "trigger to only
    notify" hook because people are bound to misuse the interface and
    try updating the refs anyway, making it easy to introduce
    inconsistencies between refs and FETCH_HEAD that will confuse the
    later "merge" step.

Otherwise, if it doesn't require a post-fetch hook, then it would
require the end-user to first fetch, then run the
`copy-trusted-refs-over` script, which would add stuff to the user's
workflow.

Did I miss another possibility?

> [...]
> 
> One thing that's not discussed yet, and I know just enough about for it
> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff &
> Brandon who know more) is that this feature once shipped might cause
> higher load on git hosting providers.
> 
> This is because people will inevitably use it in popular projects for
> some custom filtering, and because you're continually re-fetching and
> inspecting stuff what used to be a really cheap no-op "pull" most of the
> time is a more expensive negotiation every time before the client
> rejects the refs again, and worse for hosting providers because you have
> bespoke ref fetching strategies you have less odds of being able to
> cache both the negotiation and the pack you serve.
> 
> I.e. you want this for some security feature where 99.99% of the time
> you accept all refs, but most people will probably use this to implement
> dynamic Turing-complete refspecs.
> 
> Maybe that's worrying about nothing, but worth thinking about.

Well... First, I must say I didn't really understand your last paragraph
about Turing-complete refspecs.

But my understanding of how the fetch-hook patchset I sent this evening
works is that it first receives all the objects from the hosting
provider, then locally moves the refs, but never actually discards the
downloaded objects (well, until a `git gc` I guess).

So I don't think the network traffic with the provider would be any
different wrt. what it is now, even if a tweak-fetch hook rejects some
commits? Then again I don't know git's internals enough to be have even
a bit of certainty about what I'm saying right now, so...


[1] https://marc.info/?l=git&m=132480559712592&w=2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-09 22:04             ` Fetch-hooks Ævar Arnfjörð Bjarmason
  2018-02-09 22:24               ` Fetch-hooks Leo Gaspard
@ 2018-02-09 22:30               ` Jeff King
  2018-02-09 22:45                 ` Fetch-hooks Junio C Hamano
  2018-02-09 23:49                 ` Fetch-hooks Leo Gaspard
  1 sibling, 2 replies; 38+ messages in thread
From: Jeff King @ 2018-02-09 22:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Leo Gaspard, Joey Hess, git, Brandon Williams

On Fri, Feb 09, 2018 at 11:04:17PM +0100, Ævar Arnfjörð Bjarmason wrote:

> One thing that's not discussed yet, and I know just enough about for it
> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff &
> Brandon who know more) is that this feature once shipped might cause
> higher load on git hosting providers.
> 
> This is because people will inevitably use it in popular projects for
> some custom filtering, and because you're continually re-fetching and
> inspecting stuff what used to be a really cheap no-op "pull" most of the
> time is a more expensive negotiation every time before the client
> rejects the refs again, and worse for hosting providers because you have
> bespoke ref fetching strategies you have less odds of being able to
> cache both the negotiation and the pack you serve.

Most of the discussion so far seems to be about "accept this ref or
don't accept this ref", which seems OK. But if you are going to do
custom tweaking like rewriting objects, or making it common to refuse
some refs, then I think things get pretty inefficient for _everybody_.

The negotiation for future fetches uses the existing refs as the
starting point. And if we don't know that we have the objects because
there are no refs pointing at them, they're going to get transferred
again. That's extra load no the server, and extra time for the user
waiting on the network.

I tend to agree with the direction of thinking you outlined: you're
generally better off completing the fetch to a local namespace that
tracks the other side completely, and then manipulating the local refs
as you see fit (e.g., fetching into refs/quarantine, and then migrating
"good" refs over to refs/remotes/origin).

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] fetch: preparations for tweak-fetch hook
  2018-02-09 21:44               ` [PATCH 1/2] fetch: preparations for " Leo Gaspard
  2018-02-09 21:44                 ` [PATCH 2/2] fetch: add " Leo Gaspard
@ 2018-02-09 22:34                 ` " Junio C Hamano
  1 sibling, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2018-02-09 22:34 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: git, Joey Hess, Ævar Arnfjörð Bjarmason, Johannes Sixt

Leo Gaspard <leo@gaspard.io> writes:

> From: Léo Gaspard <leo@gaspard.io>
>
> No behavior changes yet, only some groundwork for the next change.
>
> The refs_result structure combines a status code with a ref map, which
> can be NULL even on success. 

Sorry, but I have absolutely no idea what this sentence wants to do
by being here in the log message.  "even on success" makes it sound
as if it is normal to be NULL when status code != success, but it is
not even clear why it is the case, and "can be NULL" implies that it
is not always NULL, but it does not make it clear when it is NULL
and when it is not NULL when code == success.  If you find the need
to explain what each field of a new struct you are introducing to
the codebase means (and it probably is a good idea for this one),
perhaps the right place to do so is in in-code comment next to the
struct definition?

It seems that you also moved add_existing() in the file, for no
apparent reason.

> This will be needed when there's a
> tweak-fetch hook, because it can filter out all refs, while still
> succeeding.
>
> fetch_refs returns a refs_result, so that it can modify the ref_map.

The description keeps saying "ref map", but it is unclear what it
is, why it is a good thing to allow the caller "modify" it, and what
kind of modification the caller wants to make for what reason.

Also, in C, we tend to avoid passing structure by value.  It seems
that the patch makes a callchain involving backfill_tags() return
this struct by value, which goes against the convention.  I suspect
that it should be trivial to have the caller supply an on-stack
instance and pass the pointer to it down the callchain, though.

> Based-on-patch-by: Joey Hess <joey@kitenet.net>
> Signed-off-by: Leo Gaspard <leo@gaspard.io>
> ---
>  builtin/fetch.c | 68 +++++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 44 insertions(+), 24 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 7bbcd26fa..76dc05f61 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -34,6 +34,11 @@ enum {
>  	TAGS_SET = 2
>  };
>  
> +struct refs_result {
> +	struct ref *new_refs;
> +	int status;
> +};
> +
>  static int fetch_prune_config = -1; /* unspecified */
>  static int prune = -1; /* unspecified */
>  #define PRUNE_BY_DEFAULT 0 /* do we prune by default? */
> @@ -57,6 +62,18 @@ static int shown_url = 0;
>  static int refmap_alloc, refmap_nr;
>  static const char **refmap_array;
>  
> +static int add_existing(const char *refname, const struct object_id *oid,
> +			int flag, void *cbdata)
> +{
> +	struct string_list *list = (struct string_list *)cbdata;
> +	struct string_list_item *item = string_list_insert(list, refname);
> +	struct object_id *old_oid = xmalloc(sizeof(*old_oid));
> +
> +	oidcpy(old_oid, oid);
> +	item->util = old_oid;
> +	return 0;
> +}
> +
>  static int git_fetch_config(const char *k, const char *v, void *cb)
>  {
>  	if (!strcmp(k, "fetch.prune")) {
> @@ -217,18 +234,6 @@ static void add_merge_config(struct ref **head,
>  	}
>  }
>  
> -static int add_existing(const char *refname, const struct object_id *oid,
> -			int flag, void *cbdata)
> -{
> -	struct string_list *list = (struct string_list *)cbdata;
> -	struct string_list_item *item = string_list_insert(list, refname);
> -	struct object_id *old_oid = xmalloc(sizeof(*old_oid));
> -
> -	oidcpy(old_oid, oid);
> -	item->util = old_oid;
> -	return 0;
> -}
> -
>  static int will_fetch(struct ref **head, const unsigned char *sha1)
>  {
>  	struct ref *rm = *head;
> @@ -920,15 +925,20 @@ static int quickfetch(struct ref *ref_map)
>  	return check_connected(iterate_ref_map, &rm, &opt);
>  }
>  
> -static int fetch_refs(struct transport *transport, struct ref *ref_map)
> +static struct refs_result fetch_refs(struct transport *transport,
> +		struct ref *ref_map)
>  {
> -	int ret = quickfetch(ref_map);
> -	if (ret)
> -		ret = transport_fetch_refs(transport, ref_map);
> -	if (!ret)
> -		ret |= store_updated_refs(transport->url,
> +	struct refs_result ret;
> +	ret.status = quickfetch(ref_map);
> +	if (ret.status) {
> +		ret.status = transport_fetch_refs(transport, ref_map);
> +	}
> +	if (!ret.status) {
> +		ret.new_refs = ref_map;
> +		ret.status |= store_updated_refs(transport->url,
>  				transport->remote->name,
> -				ref_map);
> +				ret.new_refs);
> +	}
>  	transport_unlock_pack(transport);
>  	return ret;
>  }
> @@ -1048,9 +1058,11 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
>  	return transport;
>  }
>  
> -static void backfill_tags(struct transport *transport, struct ref *ref_map)
> +static struct refs_result backfill_tags(struct transport *transport,
> +		struct ref *ref_map)
>  {
>  	int cannot_reuse;
> +	struct refs_result res;
>  
>  	/*
>  	 * Once we have set TRANS_OPT_DEEPEN_SINCE, we can't unset it
> @@ -1069,12 +1081,14 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map)
>  	transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
>  	transport_set_option(transport, TRANS_OPT_DEPTH, "0");
>  	transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
> -	fetch_refs(transport, ref_map);
> +	res = fetch_refs(transport, ref_map);
>  
>  	if (gsecondary) {
>  		transport_disconnect(gsecondary);
>  		gsecondary = NULL;
>  	}
> +
> +	return res;
>  }
>  
>  static int do_fetch(struct transport *transport,
> @@ -1083,6 +1097,7 @@ static int do_fetch(struct transport *transport,
>  	struct string_list existing_refs = STRING_LIST_INIT_DUP;
>  	struct ref *ref_map;
>  	struct ref *rm;
> +	struct refs_result res;
>  	int autotags = (transport->remote->fetch_tags == 1);
>  	int retcode = 0;
>  
> @@ -1135,7 +1150,10 @@ static int do_fetch(struct transport *transport,
>  				   transport->url);
>  		}
>  	}
> -	if (fetch_refs(transport, ref_map)) {
> +
> +	res = fetch_refs(transport, ref_map);
> +	ref_map = res.new_refs;
> +	if (res.status) {
>  		free_refs(ref_map);
>  		retcode = 1;
>  		goto cleanup;
> @@ -1148,8 +1166,10 @@ static int do_fetch(struct transport *transport,
>  		struct ref **tail = &ref_map;
>  		ref_map = NULL;
>  		find_non_local_tags(transport, &ref_map, &tail);
> -		if (ref_map)
> -			backfill_tags(transport, ref_map);
> +		if (ref_map) {
> +			res = backfill_tags(transport, ref_map);
> +			ref_map = res.new_refs;
> +		}
>  		free_refs(ref_map);
>  	}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/2] fetch: add tweak-fetch hook
  2018-02-09 21:44                 ` [PATCH 2/2] fetch: add " Leo Gaspard
@ 2018-02-09 22:40                   ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2018-02-09 22:40 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: git, Joey Hess, Ævar Arnfjörð Bjarmason, Johannes Sixt

Leo Gaspard <leo@gaspard.io> writes:

> +tweak-fetch
> +~~~~~~~~~~~
> +
> +This hook is invoked by 'git fetch' (commonly called by 'git pull'), after refs
> +have been fetched from the remote repository. It is not executed, if nothing was
> +fetched.

Need to tighten explanation of "nothing was fetched".  If the only
change I made to my repository is that I created a new branch that
points at an existing object since last time you fetched, you would
obtain no new object when you fetch from me.  Would that count as
"nothing was fetched"?  Or would it be still fetching something
(i.e. your remote-tracking hierarchy will record the fact that I now
have this new branch)?

> +  <sha1> SP not-for-merge|merge|ignore SP <remote-refname> SP <local-refname> LF
> + ...
> +The `<remote-refname>` is the remote's name for the ref that was fetched, and
> +`<local-refname>` is a name of a remote-tracking branch, like
> +"refs/remotes/origin/master". `<local-refname>` can be undefined if the fetched
> +ref is not being stored in a local refname. In this case, it will be set to `@`,

Don't use "@"; leave it empty instead.

> +TODO: Add documentation for the “ignore” parameter. Unfortunately, I'm not
> +really sure I get what this does or what invariants it is supposed to maintain
> +(eg. all “ignore” updates at the end of the refs list?), so this may also
> +require code changes.

If you are not using the feature, wouldn't it make more sense not to
add it in the first place?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-09 22:30               ` Fetch-hooks Jeff King
@ 2018-02-09 22:45                 ` Junio C Hamano
  2018-02-09 23:49                 ` Fetch-hooks Leo Gaspard
  1 sibling, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2018-02-09 22:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Leo Gaspard, Joey Hess,
	git, Brandon Williams

Jeff King <peff@peff.net> writes:

> The negotiation for future fetches uses the existing refs as the
> starting point. And if we don't know that we have the objects because
> there are no refs pointing at them, they're going to get transferred
> again. That's extra load no the server, and extra time for the user
> waiting on the network.
>
> I tend to agree with the direction of thinking you outlined: you're
> generally better off completing the fetch to a local namespace that
> tracks the other side completely, and then manipulating the local refs
> as you see fit (e.g., fetching into refs/quarantine, and then migrating
> "good" refs over to refs/remotes/origin).

Thanks for a dose of sanity ;-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-09 22:24               ` Fetch-hooks Leo Gaspard
@ 2018-02-09 22:56                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 38+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-02-09 22:56 UTC (permalink / raw)
  To: Leo Gaspard; +Cc: Joey Hess, git, Jeff King, Brandon Williams


On Fri, Feb 09 2018, Leo Gaspard jotted:

> On 02/09/2018 11:04 PM, Ævar Arnfjörð Bjarmason wrote:>>> You could also
> have some intermediate step between these two, where
>>>> e.g. your refspec for "origin" is
>>>> "+refs/heads/*:refs/remotes/origin-untrusted/*" instead of the default
>>>> "+refs/heads/*:refs/remotes/origin/*", you fetch all refs to that
>>>> location, then you move them (with some alias/hook) to
>>>> "refs/remotes/origin/*" once they're seen to be "OK".
>>>
>>> That is indeed another possibility, but then the idea is to make things
>>> as transparent as possible for the end-user, not to completely change
>>> their git workflow. As such, considering only signed commits to be part
>>> of the upstream seems to make sense to me?
>>
>> I mean this would be something that would be part of a post-fetch hook,
>> so it would be as transparent as what you're doing to the user, with the
>> difference that it doesn't need to involve changes to what you slurp
>> down from the server.
>>
>> I.e. we'd just fetch into refs/remotes/origin-untrusted/, then we run
>> your post-fetch hook and you go over the new refs, and copy what you
>> like (usually everything) to refs/remotes/origin/*.
>
> Hmm... but that would then require a post-fetch hook, wouldn't it? And
> about a post-fetch hook, if I understood correctly, Junio in [1] had a
> quite nice argument against it:
>
>     Although I do not deeply care between such a "trigger to only
>     notify, no touching" hook and a full-blown "allow hook writers to
>     easily lie about what happened in the fetch" hook, I was hoping that
>     we would get this right and useful if we were spending our brain
>     bandwidth on it. I am not very fond of an easier "trigger to only
>     notify" hook because people are bound to misuse the interface and
>     try updating the refs anyway, making it easy to introduce
>     inconsistencies between refs and FETCH_HEAD that will confuse the
>     later "merge" step.
>
> Otherwise, if it doesn't require a post-fetch hook, then it would
> require the end-user to first fetch, then run the
> `copy-trusted-refs-over` script, which would add stuff to the user's
> workflow.
>
> Did I miss another possibility?

Yes, the assumption is that there would be a post-fetch hook of some
sort to ferry the refs over from the quarantine.

My reading of that thread is not that Junio's outright against such a
facility (and also it was 2011 and the discussion can be re-visited),
but rather that there's specific concerns that need to be kept in mind
so reliable things involving the ref store don't become brittle as a
result of unstable user hooks, or us offering an interface that's easily
misused.

>> [...]
>>
>> One thing that's not discussed yet, and I know just enough about for it
>> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff &
>> Brandon who know more) is that this feature once shipped might cause
>> higher load on git hosting providers.
>>
>> This is because people will inevitably use it in popular projects for
>> some custom filtering, and because you're continually re-fetching and
>> inspecting stuff what used to be a really cheap no-op "pull" most of the
>> time is a more expensive negotiation every time before the client
>> rejects the refs again, and worse for hosting providers because you have
>> bespoke ref fetching strategies you have less odds of being able to
>> cache both the negotiation and the pack you serve.
>>
>> I.e. you want this for some security feature where 99.99% of the time
>> you accept all refs, but most people will probably use this to implement
>> dynamic Turing-complete refspecs.
>>
>> Maybe that's worrying about nothing, but worth thinking about.
>
> Well... First, I must say I didn't really understand your last paragraph
> about Turing-complete refspecs.
>
> But my understanding of how the fetch-hook patchset I sent this evening
> works is that it first receives all the objects from the hosting
> provider, then locally moves the refs, but never actually discards the
> downloaded objects (well, until a `git gc` I guess).
>
> So I don't think the network traffic with the provider would be any
> different wrt. what it is now, even if a tweak-fetch hook rejects some
> commits? Then again I don't know git's internals enough to be have even
> a bit of certainty about what I'm saying right now, so...
>
>
> [1] https://marc.info/?l=git&m=132480559712592&w=2

As Jeff notes in the side-thread in
<20180209223011.GA24578@sigill.intra.peff.net> when you do a "fetch" the
protocol is not negotiating on the basis of what loose unreferenced
(because you threw away the ref!) objects you have already, but what
the ref commonality is between you and the server (and this is just on
a best-effort basis).

So for example, let's say you only accept the "master" branch and have a
hook that's refusing updates to the "dev" branch, fetch is going to be
re-negotiating fetching the difference between the two over the wire
every time, only to have the hook say "no thanks" and throw away the
result.

I.e. as an extreme example, if the dev branch is adding a 1GB file
divergent from master, you are going to be re-downloading that 1GB every
time you fetch, even though you have that blob locally, the remote can't
know that, and because you have no ref pointing to it you can't mention
it during the negotiation.

Which is way less efficient than the case where your refspec only
fetches "master", since then we won't try at all, or the case where you
fetch both master & dev, but into some quarantine area and are just
deciding to update a local ref or not.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-09 22:30               ` Fetch-hooks Jeff King
  2018-02-09 22:45                 ` Fetch-hooks Junio C Hamano
@ 2018-02-09 23:49                 ` Leo Gaspard
  2018-02-10  0:13                   ` Fetch-hooks Jeff King
  1 sibling, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-09 23:49 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason
  Cc: Joey Hess, git, Brandon Williams, Junio C Hamano

On 02/09/2018 11:30 PM, Jeff King wrote:
> On Fri, Feb 09, 2018 at 11:04:17PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> One thing that's not discussed yet, and I know just enough about for it
>> to tingle my spidey sense, but not enough to say for sure (CC'd Jeff &
>> Brandon who know more) is that this feature once shipped might cause
>> higher load on git hosting providers.
>>
>> This is because people will inevitably use it in popular projects for
>> some custom filtering, and because you're continually re-fetching and
>> inspecting stuff what used to be a really cheap no-op "pull" most of the
>> time is a more expensive negotiation every time before the client
>> rejects the refs again, and worse for hosting providers because you have
>> bespoke ref fetching strategies you have less odds of being able to
>> cache both the negotiation and the pack you serve.
> 
> Most of the discussion so far seems to be about "accept this ref or
> don't accept this ref", which seems OK. But if you are going to do
> custom tweaking like rewriting objects, or making it common to refuse
> some refs, then I think things get pretty inefficient for _everybody_.
> 
> The negotiation for future fetches uses the existing refs as the
> starting point. And if we don't know that we have the objects because
> there are no refs pointing at them, they're going to get transferred
> again. That's extra load no the server, and extra time for the user
> waiting on the network.

Oh. I thought the protocol git used was something like
client: I want to fetch refs A and B
server: so you'll need objects 12345678 and 90ABCDEF, A and B both point
to 12345678
client: please give me object 12345678
server: here it is
[...]

I was clearly wrong, thanks! (and thanks Ævar for your explanation in
the side-thread, too!)

> I tend to agree with the direction of thinking you outlined: you're
> generally better off completing the fetch to a local namespace that
> tracks the other side completely, and then manipulating the local refs
> as you see fit (e.g., fetching into refs/quarantine, and then migrating
> "good" refs over to refs/remotes/origin).

Hmm... so do I understand it correctly when I say the process you're
thinking about works like this?
 * User installs hook for my-remote by running [something]
 * User runs git fetch
 * git fetch fetches remote refs/heads/* to local refs/quarantine/* (so
I guess [something] changes the remote.my-remote.fetch refmap)
 * When this is done `git fetch` runs a notification-only post-fetch
hook (that would need to be added)
 * The post-fetch hook then performs whatever it wants and updates the
references in refs/remotes/my-remote/*

So the changes that are required are:
 * Adding a notification-only post-fetch hook
 * For handling tags, there is a need to have a refmap for tags. Maybe
adding a remote.my-remote.fetchTags refmap, that would be used when
running with --tags, and having it default to “refs/tags/*:refs/tags/*”
to keep the current behavior by default?

The only remaining issue I can think of is: How do we avoid the issue of
the
trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification
raised in the side-thread that Junio wrote in [1]? Maybe just writing in
the documentation that the hook should use a quarantine-like approach if
it wants modification would be enough to not have hook authors try to
modify the ref in the post-fetch hook?

Thanks for all your thoughts, and hope we're getting somewhere!
Leo


PS: I'll read over the reviews once I'm all clear as to what exactly is
wanted for this patch, as most likely they'll just be dumped, given the
current state of affairs.

[1] https://marc.info/?l=git&m=132480559712592&w=2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-09 23:49                 ` Fetch-hooks Leo Gaspard
@ 2018-02-10  0:13                   ` Jeff King
  2018-02-10  0:37                     ` Fetch-hooks Leo Gaspard
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2018-02-10  0:13 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote:

> > I tend to agree with the direction of thinking you outlined: you're
> > generally better off completing the fetch to a local namespace that
> > tracks the other side completely, and then manipulating the local refs
> > as you see fit (e.g., fetching into refs/quarantine, and then migrating
> > "good" refs over to refs/remotes/origin).
> 
> Hmm... so do I understand it correctly when I say the process you're
> thinking about works like this?
>  * User installs hook for my-remote by running [something]
>  * User runs git fetch
>  * git fetch fetches remote refs/heads/* to local refs/quarantine/* (so
> I guess [something] changes the remote.my-remote.fetch refmap)
>  * When this is done `git fetch` runs a notification-only post-fetch
> hook (that would need to be added)
>  * The post-fetch hook then performs whatever it wants and updates the
> references in refs/remotes/my-remote/*

Yeah, that was roughly my thinking.

> So the changes that are required are:
>  * Adding a notification-only post-fetch hook
>  * For handling tags, there is a need to have a refmap for tags. Maybe
> adding a remote.my-remote.fetchTags refmap, that would be used when
> running with --tags, and having it default to “refs/tags/*:refs/tags/*”
> to keep the current behavior by default?

Yeah, tag-following may be a little tricky, because it usually wants to
write to refs/tags/. One workaround would be to have your config look
like this:

  [remote "origin"]
  fetch = +refs/heads/*:refs/quarantine/origin/heads/*
  fetch = +refs/tags/*:refs/quarantine/origin/tags/*
  tagOpt = --no-tags

That's not exactly the same thing, because it would fetch all tags, not
just those that point to the history on the branches. But in most
repositories and workflows the distinction doesn't matter.

(By the way, the I specifically chose the name "refs/quarantine" instead
of anything in "refs/remotes" because we'd want to make sure that the
"git checkout" DWIM behavior cannot accidentally pull from quarantine).

> The only remaining issue I can think of is: How do we avoid the issue
> of the
> trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification
> raised in the side-thread that Junio wrote in [1]? Maybe just writing
> in the documentation that the hook should use a quarantine-like
> approach if it wants modification would be enough to not have hook
> authors try to modify the ref in the post-fetch hook?

I don't have a silver bullet there. Documenting the "right" way at least
seems like a good first step.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10  0:13                   ` Fetch-hooks Jeff King
@ 2018-02-10  0:37                     ` Leo Gaspard
  2018-02-10  1:08                       ` Fetch-hooks Junio C Hamano
  2018-02-10 12:21                       ` Fetch-hooks Jeff King
  0 siblings, 2 replies; 38+ messages in thread
From: Leo Gaspard @ 2018-02-10  0:37 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On 02/10/2018 01:13 AM, Jeff King wrote:
> On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote:
>> So the changes that are required are:
>>  * Adding a notification-only post-fetch hook
>>  * For handling tags, there is a need to have a refmap for tags. Maybe
>> adding a remote.my-remote.fetchTags refmap, that would be used when
>> running with --tags, and having it default to “refs/tags/*:refs/tags/*”
>> to keep the current behavior by default?
> 
> Yeah, tag-following may be a little tricky, because it usually wants to
> write to refs/tags/. One workaround would be to have your config look
> like this:
> 
>   [remote "origin"]
>   fetch = +refs/heads/*:refs/quarantine/origin/heads/*
>   fetch = +refs/tags/*:refs/quarantine/origin/tags/*
>   tagOpt = --no-tags
> 
> That's not exactly the same thing, because it would fetch all tags, not
> just those that point to the history on the branches. But in most
> repositories and workflows the distinction doesn't matter.

Hmm... apart from the implementation complexity (of which I have no
idea), is there an argument against the idea of adding a
remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used
every time a tag is fetched? (well, maybe not exactly similar to
remote.<name>.fetch because we know the source is going to be
refs/tags/*; so just having the part of .fetch past the ':' would be
more like what's needed I guess)

The issue with your solution is that if the user runs 'git fetch
--tags', he will get the (potentially compromised) tags directly in his
refs/tags/.

> (By the way, the I specifically chose the name "refs/quarantine" instead
> of anything in "refs/remotes" because we'd want to make sure that the
> "git checkout" DWIM behavior cannot accidentally pull from quarantine).

(Indeed, I understood after reading it, and would likely not have
thought of it otherwise, thanks!)

>> The only remaining issue I can think of is: How do we avoid the issue
>> of the
>> trigger-only-hook-inciting-bad-behavior-by-hook-authors-who-really-want-modification
>> raised in the side-thread that Junio wrote in [1]? Maybe just writing
>> in the documentation that the hook should use a quarantine-like
>> approach if it wants modification would be enough to not have hook
>> authors try to modify the ref in the post-fetch hook?
> 
> I don't have a silver bullet there. Documenting the "right" way at least
> seems like a good first step.

So long as it's not a merge-blocker it's good with me! (but then I'm
likely not the one who's going to be pointed at when things go wrong in
a hook, so I'm clearly biased on this matter)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10  0:37                     ` Fetch-hooks Leo Gaspard
@ 2018-02-10  1:08                       ` Junio C Hamano
  2018-02-10  1:33                         ` Fetch-hooks Leo Gaspard
  2018-02-10 12:21                       ` Fetch-hooks Jeff King
  1 sibling, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2018-02-10  1:08 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess,
	git, Brandon Williams

Leo Gaspard <leo@gaspard.io> writes:

> On 02/10/2018 01:13 AM, Jeff King wrote:
>> On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote:
>>> So the changes that are required are:
>>>  * Adding a notification-only post-fetch hook

Maybe I missed a very early part of the discussion, but why does
this even need a hook?  There are some numbers [*1*] of classes of
valid reasons we may want to have hooks triggered by operations, but
"always do something locally after doing something else locally,
regardless of the outcome of that something else" feels like the
most typical anti-pattern that we do not want a hook for.  If you
are doing "git fetch" (or "git pull"), you already know you are
doing that and you donot need a notification.  You just create a
workflow specific script that calls fetch or pull, followed by
whatever you want to do and use that, instead of doing "git pull",
and that is not any extra work than writing a hook and installing
it.

Unlike something like post-receive, which happens on the remote side
where you may not even have an interactive access to, in response to
the operation you locally do (i.e. "git push"), fetching and then
doing something else in a repository you fetch into has no reason to
be done as a hook.

[Footnote]

*1* I think the number was 5 when I last counted/enumerated, but
    don't quote me on that ;-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10  1:08                       ` Fetch-hooks Junio C Hamano
@ 2018-02-10  1:33                         ` Leo Gaspard
  2018-02-10 18:03                           ` Fetch-hooks Leo Gaspard
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-10  1:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess,
	git, Brandon Williams

On 02/10/2018 02:08 AM, Junio C Hamano wrote:
> Leo Gaspard <leo@gaspard.io> writes:
> 
>> On 02/10/2018 01:13 AM, Jeff King wrote:
>>> On Sat, Feb 10, 2018 at 12:49:31AM +0100, Leo Gaspard wrote:
>>>> So the changes that are required are:
>>>>  * Adding a notification-only post-fetch hook
> 
> Maybe I missed a very early part of the discussion, but why does
> this even need a hook?  There are some numbers [*1*] of classes of
> valid reasons we may want to have hooks triggered by operations, but
> "always do something locally after doing something else locally,
> regardless of the outcome of that something else" feels like the
> most typical anti-pattern that we do not want a hook for.  If you
> are doing "git fetch" (or "git pull"), you already know you are
> doing that and you donot need a notification.  You just create a
> workflow specific script that calls fetch or pull, followed by
> whatever you want to do and use that, instead of doing "git pull",
> and that is not any extra work than writing a hook and installing
> it.
> 
> Unlike something like post-receive, which happens on the remote side
> where you may not even have an interactive access to, in response to
> the operation you locally do (i.e. "git push"), fetching and then
> doing something else in a repository you fetch into has no reason to
> be done as a hook.

I guess the very early part of the discussion you're speaking of is what
I was assuming after reading
    https://marc.info/?l=git&m=132478296309094&w=2

Then, it's always possible to just write workflow-specific scripts that
manually run git fetch then copy the refs (resp. run git fetch then copy
the refs then run git merge) to get git myfetch (resp. git mypull) [1].

But then, the question is, why is there a pre-push hook? it's already
possible to have a custom script that first runs the hook then runs git
push, for most if not all of the use cases of the pre-push hook. Yet the
possibility to not change the end-user's workflow is what makes pre-push
(or pre-commit, with which the similarity is perhaps even more obvious)
so useful.

So the reason for a post-fetch in my opinion is the same as for a
pre-push / pre-commit: not changing the user's workflow, while providing
additional features.


[1]

Which makes me notice that actually the post-fetch hook technique we
were discussing with Peff suffers the same not-updating-FETCH_HEAD issue
that was discussed in this early thread: a `git pull` would try to merge
from refs/quarantine, I guess. So things are a bit harder than we thought.

I guess the tweak-fetch hook could be left “as-is”, but with git
automatically doing the “quarantine-ing” thing transparently so that the
references that the end-user (or the hook for that matter) see are the
“curated” ones? Then it's too late for me to think efficiently right
now, so that idea may be stupid or over-complex.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10  0:37                     ` Fetch-hooks Leo Gaspard
  2018-02-10  1:08                       ` Fetch-hooks Junio C Hamano
@ 2018-02-10 12:21                       ` Jeff King
  2018-02-10 18:36                         ` Fetch-hooks Leo Gaspard
  2018-02-14  1:46                         ` Fetch-hooks Jacob Keller
  1 sibling, 2 replies; 38+ messages in thread
From: Jeff King @ 2018-02-10 12:21 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote:

> > Yeah, tag-following may be a little tricky, because it usually wants to
> > write to refs/tags/. One workaround would be to have your config look
> > like this:
> > 
> >   [remote "origin"]
> >   fetch = +refs/heads/*:refs/quarantine/origin/heads/*
> >   fetch = +refs/tags/*:refs/quarantine/origin/tags/*
> >   tagOpt = --no-tags
> > 
> > That's not exactly the same thing, because it would fetch all tags, not
> > just those that point to the history on the branches. But in most
> > repositories and workflows the distinction doesn't matter.
> 
> Hmm... apart from the implementation complexity (of which I have no
> idea), is there an argument against the idea of adding a
> remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used
> every time a tag is fetched? (well, maybe not exactly similar to
> remote.<name>.fetch because we know the source is going to be
> refs/tags/*; so just having the part of .fetch past the ':' would be
> more like what's needed I guess)

I don't think it would be too hard to implement, and is at least
logically consistent with the way we handle tags.

If we were designing from scratch, I do think this would all be helped
immensely by having more separation of refs fetched from remotes. There
was a proposal in the v1.8 era to fetch everything for a remote,
including tags, into "refs/remotes/origin/heads/",
"refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to
look for tags in each of the remote-tag namespaces.

I still think that would be a good direction to go, but it would be a
big project (which is why the original stalled).

> The issue with your solution is that if the user runs 'git fetch
> --tags', he will get the (potentially compromised) tags directly in his
> refs/tags/.

True.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10  1:33                         ` Fetch-hooks Leo Gaspard
@ 2018-02-10 18:03                           ` Leo Gaspard
  0 siblings, 0 replies; 38+ messages in thread
From: Leo Gaspard @ 2018-02-10 18:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess,
	git, Brandon Williams

On 02/10/2018 02:33 AM, Leo Gaspard wrote:> I guess the very early part
of the discussion you're speaking of is what
> I was assuming after reading
>     https://marc.info/?l=git&m=132478296309094&w=2
> 
> [...]
> 
> So the reason for a post-fetch in my opinion is the same as for a
> pre-push / pre-commit: not changing the user's workflow, while providing
> additional features.
Well, re-reading my email, it looks aggressive to me now... sorry about
that!

What I meant was basically, that in my mind pre-push or pre-commit are
also local-only things, and they are really useful in that they allow to
block the push or the commit if some conditions are not met (eg. block
commit with trailing whitespace, or block push of unsigned commits).

In pretty much the same way, what I'm really looking for is a way to
“block the fetch” (from a user-visible standpoint) -- like pre-push but
in the opposite direction. I hope such a goal is not an anti-pattern for
hook design?

In order to do this, I first tried updating this tweak-fetch hook patch
from late 2011, and then learned that it would cause too high a load on
servers. Then, while brainstorming another solution, this idea of a
notification-only post-fetch hook arose. But, as I noticed while writing
my previous answer, this suffers the same
the-hook-writer-must-correctly-update-FETCH_HEAD-concurrently-with-remote-branch
issue that you pointed out just after the “*HOWEVER*” in [1]. So that
solution is likely a bad solution too. I guess we'll continue the search
for a good solution in the side-thread with Peff, hoping to figure one out.

That being said about what I meant in my last email, obviously you're
the one who has the say on what goes in or not! And if it's an
anti-feature I'd much rather know it now than after spending a few
nights coding :)

So, what do you think about this use case I'm thinking of? (ie.
“blocking the fetch” like pre-push “blocks the push” and pre-commit
“blocks the commit”?)


[1] https://marc.info/?l=git&m=132478296309094&w=2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10 12:21                       ` Fetch-hooks Jeff King
@ 2018-02-10 18:36                         ` Leo Gaspard
  2018-02-12 19:23                           ` Fetch-hooks Brandon Williams
  2018-02-14  1:35                           ` Fetch-hooks Jeff King
  2018-02-14  1:46                         ` Fetch-hooks Jacob Keller
  1 sibling, 2 replies; 38+ messages in thread
From: Leo Gaspard @ 2018-02-10 18:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On 02/10/2018 01:21 PM, Jeff King wrote:
> On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote:
> 
>>> Yeah, tag-following may be a little tricky, because it usually wants to
>>> write to refs/tags/. One workaround would be to have your config look
>>> like this:
>>>
>>>   [remote "origin"]
>>>   fetch = +refs/heads/*:refs/quarantine/origin/heads/*
>>>   fetch = +refs/tags/*:refs/quarantine/origin/tags/*
>>>   tagOpt = --no-tags
>>>
>>> That's not exactly the same thing, because it would fetch all tags, not
>>> just those that point to the history on the branches. But in most
>>> repositories and workflows the distinction doesn't matter.
>>
>> Hmm... apart from the implementation complexity (of which I have no
>> idea), is there an argument against the idea of adding a
>> remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used
>> every time a tag is fetched? (well, maybe not exactly similar to
>> remote.<name>.fetch because we know the source is going to be
>> refs/tags/*; so just having the part of .fetch past the ':' would be
>> more like what's needed I guess)
> 
> I don't think it would be too hard to implement, and is at least
> logically consistent with the way we handle tags.
> 
> If we were designing from scratch, I do think this would all be helped
> immensely by having more separation of refs fetched from remotes. There
> was a proposal in the v1.8 era to fetch everything for a remote,
> including tags, into "refs/remotes/origin/heads/",
> "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to
> look for tags in each of the remote-tag namespaces.
> 
> I still think that would be a good direction to go, but it would be a
> big project (which is why the original stalled).

Hmm... would this also drown the remote.<name>.fetch map? Also, I think
this behavior could be emulated with fetch and fetchTagsTo, and it would
look like:
[remote "my-remote"]
    fetch = +refs/heads/*:refs/remotes/my-remote/heads/*
    fetchTagsTo = refs/remotes/my-remote/tags/*
The remaining issue being to teach the lookup side to look for tags in
all the remote-tag namespaces (and the fact it's a breaking change).

That said, actually I just noticed an issue in the “add a
remote.<name>.fetch option to fetch to refs/quarantine then have the
post-fetch hook do the work”: it means if I run `git pull`, then:
 1. The remote references will be pulled to refs/quarantine/...
 2. The post-fetch hook will copy the accepted ones to refs/remotes/...
 3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into
local branches... and so merge from refs/quarantine.

A solution would be to also update FETCH_HEAD in the post-fetch hook,
but then we're back to the issue raised by Junio after the “*HOWEVER*”
of [1]: the hook writer has to maintain consistency between the “copied”
references and FETCH_HEAD.

So, when thinking about it, I'm back to thinking the proper hook
interface should be the one of the tweak-fetch hook, but its
implementation should make it not go crazy on remote servers. And so
that the implementation should do all this refs/quarantine wizardry
inside git itself.

So basically what I'm getting at at the moment is that git fetch should:
 1. fetch all the references to refs/real-remotes/<name>/{insert here a
copy of the refs/ tree of <name>}
 2. figure out what the “expected” names for these references will by,
by looking at remote.<name>.fetch (and at whether --tags was passed)
 3. for each “expected” name,
     1. if a tweak-fetch hook is present, call it with the
refs/real-remotes/... refname and the “expected end-name” from
remote.<name>.fetch
     2. based on the hook's result, potentially to move the “expected
end-name” to another commit than the one referenced by refs/real-remotes/...
     3. move the “expected” name to the commit referenced in
refs/real-remotes

Which would make the tweak-fetch hook interface simpler (though more
restrictive, but I don't have any real use case for the other change
possibilities) than it is now:
 1. feed the hook with lines of
“refs/real-remotes/my-remote/heads/my-branch
refs/remotes/my-remote/my-branch” (assuming remote.my-remote.fetch is
“normal”) or “refs/real-remotes/my-remote/tags/my-tag refs/tags/my-tag”
(if my-tag is being fetched from my-remote)
 2. read lines of “<refspec> refs/remotes/my-remote/my-branch”, that
will re-point my-branch to <refspec> instead of
refs/real-remotes/my-remote/heads/my-branch. In order to avoid any
issue, the hook is not allowed to pass as second output a reference that
was not passed as second input.

This way, the behavior of the tweak-fetch hook is reasonably preserved
(at least for my use case), and there is no additional load on the
servers thanks to the up-to-date references being stored in
refs/real-remotes/<name>/<refspec>

Does this reasoning make any sense?


[1] https://marc.info/?l=git&m=132478296309094&w=2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10 18:36                         ` Fetch-hooks Leo Gaspard
@ 2018-02-12 19:23                           ` Brandon Williams
  2018-02-13 15:44                             ` Fetch-hooks Leo Gaspard
  2018-02-14  1:38                             ` Fetch-hooks Jeff King
  2018-02-14  1:35                           ` Fetch-hooks Jeff King
  1 sibling, 2 replies; 38+ messages in thread
From: Brandon Williams @ 2018-02-12 19:23 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess,
	git, Junio C Hamano

On 02/10, Leo Gaspard wrote:
> On 02/10/2018 01:21 PM, Jeff King wrote:
> > On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote:
> > 
> >>> Yeah, tag-following may be a little tricky, because it usually wants to
> >>> write to refs/tags/. One workaround would be to have your config look
> >>> like this:
> >>>
> >>>   [remote "origin"]
> >>>   fetch = +refs/heads/*:refs/quarantine/origin/heads/*
> >>>   fetch = +refs/tags/*:refs/quarantine/origin/tags/*
> >>>   tagOpt = --no-tags
> >>>
> >>> That's not exactly the same thing, because it would fetch all tags, not
> >>> just those that point to the history on the branches. But in most
> >>> repositories and workflows the distinction doesn't matter.
> >>
> >> Hmm... apart from the implementation complexity (of which I have no
> >> idea), is there an argument against the idea of adding a
> >> remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used
> >> every time a tag is fetched? (well, maybe not exactly similar to
> >> remote.<name>.fetch because we know the source is going to be
> >> refs/tags/*; so just having the part of .fetch past the ':' would be
> >> more like what's needed I guess)
> > 
> > I don't think it would be too hard to implement, and is at least
> > logically consistent with the way we handle tags.
> > 
> > If we were designing from scratch, I do think this would all be helped
> > immensely by having more separation of refs fetched from remotes. There
> > was a proposal in the v1.8 era to fetch everything for a remote,
> > including tags, into "refs/remotes/origin/heads/",
> > "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to
> > look for tags in each of the remote-tag namespaces.
> > 
> > I still think that would be a good direction to go, but it would be a
> > big project (which is why the original stalled).
> 
> Hmm... would this also drown the remote.<name>.fetch map? Also, I think
> this behavior could be emulated with fetch and fetchTagsTo, and it would
> look like:
> [remote "my-remote"]
>     fetch = +refs/heads/*:refs/remotes/my-remote/heads/*
>     fetchTagsTo = refs/remotes/my-remote/tags/*
> The remaining issue being to teach the lookup side to look for tags in
> all the remote-tag namespaces (and the fact it's a breaking change).
> 
> That said, actually I just noticed an issue in the “add a
> remote.<name>.fetch option to fetch to refs/quarantine then have the
> post-fetch hook do the work”: it means if I run `git pull`, then:
>  1. The remote references will be pulled to refs/quarantine/...
>  2. The post-fetch hook will copy the accepted ones to refs/remotes/...
>  3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into
> local branches... and so merge from refs/quarantine.
> 
> A solution would be to also update FETCH_HEAD in the post-fetch hook,
> but then we're back to the issue raised by Junio after the “*HOWEVER*”
> of [1]: the hook writer has to maintain consistency between the “copied”
> references and FETCH_HEAD.
> 
> So, when thinking about it, I'm back to thinking the proper hook
> interface should be the one of the tweak-fetch hook, but its
> implementation should make it not go crazy on remote servers. And so
> that the implementation should do all this refs/quarantine wizardry
> inside git itself.
> 
> So basically what I'm getting at at the moment is that git fetch should:
>  1. fetch all the references to refs/real-remotes/<name>/{insert here a
> copy of the refs/ tree of <name>}
>  2. figure out what the “expected” names for these references will by,
> by looking at remote.<name>.fetch (and at whether --tags was passed)
>  3. for each “expected” name,
>      1. if a tweak-fetch hook is present, call it with the
> refs/real-remotes/... refname and the “expected end-name” from
> remote.<name>.fetch
>      2. based on the hook's result, potentially to move the “expected
> end-name” to another commit than the one referenced by refs/real-remotes/...
>      3. move the “expected” name to the commit referenced in
> refs/real-remotes
> 
> Which would make the tweak-fetch hook interface simpler (though more
> restrictive, but I don't have any real use case for the other change
> possibilities) than it is now:
>  1. feed the hook with lines of
> “refs/real-remotes/my-remote/heads/my-branch
> refs/remotes/my-remote/my-branch” (assuming remote.my-remote.fetch is
> “normal”) or “refs/real-remotes/my-remote/tags/my-tag refs/tags/my-tag”
> (if my-tag is being fetched from my-remote)
>  2. read lines of “<refspec> refs/remotes/my-remote/my-branch”, that
> will re-point my-branch to <refspec> instead of
> refs/real-remotes/my-remote/heads/my-branch. In order to avoid any
> issue, the hook is not allowed to pass as second output a reference that
> was not passed as second input.
> 
> This way, the behavior of the tweak-fetch hook is reasonably preserved
> (at least for my use case), and there is no additional load on the
> servers thanks to the up-to-date references being stored in
> refs/real-remotes/<name>/<refspec>
> 
> Does this reasoning make any sense?
> 
> 
> [1] https://marc.info/?l=git&m=132478296309094&w=2


Maybe this isn't helpful but you may be able to implement this by using
a remote-helper.  The helper could perform any sort of caching it needed
to prevent re-downloading large amounts of data that is potentially
thrown away, while only sending through the relevant commits which
satisfy some criteria (signed, etc.).

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-12 19:23                           ` Fetch-hooks Brandon Williams
@ 2018-02-13 15:44                             ` Leo Gaspard
  2018-02-14  1:38                             ` Fetch-hooks Jeff King
  1 sibling, 0 replies; 38+ messages in thread
From: Leo Gaspard @ 2018-02-13 15:44 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess,
	git, Junio C Hamano

On 02/12/2018 08:23 PM, Brandon Williams wrote:> Maybe this isn't
helpful but you may be able to implement this by using
> a remote-helper.  The helper could perform any sort of caching it needed
> to prevent re-downloading large amounts of data that is potentially
> thrown away, while only sending through the relevant commits which
> satisfy some criteria (signed, etc.).

This looks like a possibility, thanks! I'll likely try to implement it
this way if there can be no consensus on the features wanted for a
fetch-hook (as the current state of discussion looks like to me).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10 18:36                         ` Fetch-hooks Leo Gaspard
  2018-02-12 19:23                           ` Fetch-hooks Brandon Williams
@ 2018-02-14  1:35                           ` Jeff King
  2018-02-14  2:02                             ` Fetch-hooks Leo Gaspard
  1 sibling, 1 reply; 38+ messages in thread
From: Jeff King @ 2018-02-14  1:35 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On Sat, Feb 10, 2018 at 07:36:47PM +0100, Leo Gaspard wrote:

> Hmm... would this also drown the remote.<name>.fetch map? Also, I think
> this behavior could be emulated with fetch and fetchTagsTo, and it would
> look like:
> [remote "my-remote"]
>     fetch = +refs/heads/*:refs/remotes/my-remote/heads/*
>     fetchTagsTo = refs/remotes/my-remote/tags/*
> The remaining issue being to teach the lookup side to look for tags in
> all the remote-tag namespaces (and the fact it's a breaking change).

Right, I think fetching into the right spots is the easy part. Designing
the new lookup rules is the tricky part.

If you're really interested in the gory details, here's a very old
discussion on it:

  https://public-inbox.org/git/AANLkTi=yFwOAQMHhvLsB1_xmYOE9HHP2YB4H4TQzwwc8@mail.gmail.com/

I think there may have been some more concrete proposals after that, but
that's what I was able to dig up quickly.

> That said, actually I just noticed an issue in the “add a
> remote.<name>.fetch option to fetch to refs/quarantine then have the
> post-fetch hook do the work”: it means if I run `git pull`, then:
>  1. The remote references will be pulled to refs/quarantine/...
>  2. The post-fetch hook will copy the accepted ones to refs/remotes/...
>  3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into
> local branches... and so merge from refs/quarantine.

Good point. You can't munge FETCH_HEAD by playing with refspecs.

I am starting to come around to the idea that "pre-fetch" might be the
best way to do what you want. Not to rewrite refs, but perhaps to simply
reject them. In the same way that we allow pre-receive to reject pushed
refs (both are, after all, the final check on admitting new history into
the repository, just in opposite directions).

> So, when thinking about it, I'm back to thinking the proper hook
> interface should be the one of the tweak-fetch hook, but its
> implementation should make it not go crazy on remote servers. And so
> that the implementation should do all this refs/quarantine wizardry
> inside git itself.

So does anybody actually want to be able to adjust the refs as they pass
through? It really sounds like you just want to be able to reject or not
reject the fetch. And that rejecting would be the uncommon case, so it's
OK to just abort the whole operation and expect the user to figure it
out.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-12 19:23                           ` Fetch-hooks Brandon Williams
  2018-02-13 15:44                             ` Fetch-hooks Leo Gaspard
@ 2018-02-14  1:38                             ` Jeff King
  1 sibling, 0 replies; 38+ messages in thread
From: Jeff King @ 2018-02-14  1:38 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Leo Gaspard, Ævar Arnfjörð Bjarmason, Joey Hess,
	git, Junio C Hamano

On Mon, Feb 12, 2018 at 11:23:27AM -0800, Brandon Williams wrote:

> Maybe this isn't helpful but you may be able to implement this by using
> a remote-helper.  The helper could perform any sort of caching it needed
> to prevent re-downloading large amounts of data that is potentially
> thrown away, while only sending through the relevant commits which
> satisfy some criteria (signed, etc.).

Interesting idea, though it would still be calling git-fetch under the
hood, right? So I guess we're just reimplementing this "quarantine"
concept, but in theory we'd have the flexibility to store that
quarantine in another one-off sub-repository (instead of a special ref
namespace, though I suppose you could use the special ref namespace,
too).

I suspect it could work, but there would probably be a lot of rough
edges.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-10 12:21                       ` Fetch-hooks Jeff King
  2018-02-10 18:36                         ` Fetch-hooks Leo Gaspard
@ 2018-02-14  1:46                         ` Jacob Keller
  1 sibling, 0 replies; 38+ messages in thread
From: Jacob Keller @ 2018-02-14  1:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Leo Gaspard, Ævar Arnfjörð Bjarmason, Joey Hess,
	Git mailing list, Brandon Williams, Junio C Hamano

On Sat, Feb 10, 2018 at 4:21 AM, Jeff King <peff@peff.net> wrote:
> On Sat, Feb 10, 2018 at 01:37:20AM +0100, Leo Gaspard wrote:
>
>> > Yeah, tag-following may be a little tricky, because it usually wants to
>> > write to refs/tags/. One workaround would be to have your config look
>> > like this:
>> >
>> >   [remote "origin"]
>> >   fetch = +refs/heads/*:refs/quarantine/origin/heads/*
>> >   fetch = +refs/tags/*:refs/quarantine/origin/tags/*
>> >   tagOpt = --no-tags
>> >
>> > That's not exactly the same thing, because it would fetch all tags, not
>> > just those that point to the history on the branches. But in most
>> > repositories and workflows the distinction doesn't matter.
>>
>> Hmm... apart from the implementation complexity (of which I have no
>> idea), is there an argument against the idea of adding a
>> remote.<name>.fetchTagsTo refmap similar to remote.<name>.fetch but used
>> every time a tag is fetched? (well, maybe not exactly similar to
>> remote.<name>.fetch because we know the source is going to be
>> refs/tags/*; so just having the part of .fetch past the ':' would be
>> more like what's needed I guess)
>
> I don't think it would be too hard to implement, and is at least
> logically consistent with the way we handle tags.
>
> If we were designing from scratch, I do think this would all be helped
> immensely by having more separation of refs fetched from remotes. There
> was a proposal in the v1.8 era to fetch everything for a remote,
> including tags, into "refs/remotes/origin/heads/",
> "refs/remotes/origin/tags/", etc. And then we'd teach the lookup side to
> look for tags in each of the remote-tag namespaces.
>
> I still think that would be a good direction to go, but it would be a
> big project (which is why the original stalled).
>

I want to go this direction, but the difficulty has always been
limited time to actually work on such a large project.

Thanks,
Jake

>> The issue with your solution is that if the user runs 'git fetch
>> --tags', he will get the (potentially compromised) tags directly in his
>> refs/tags/.
>
> True.
>
> -Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-14  1:35                           ` Fetch-hooks Jeff King
@ 2018-02-14  2:02                             ` Leo Gaspard
  2018-02-19 21:23                               ` Fetch-hooks Jeff King
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Gaspard @ 2018-02-14  2:02 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On 02/14/2018 02:35 AM, Jeff King wrote:
> On Sat, Feb 10, 2018 at 07:36:47PM +0100, Leo Gaspard wrote:
>> [...]
> I think there may have been some more concrete proposals after that, but
> that's what I was able to dig up quickly.

Thanks! Though it looks way above my knowledge of git internals as well
as time available, it looks like a great project, and I hope it someday
succeeds!

>> That said, actually I just noticed an issue in the “add a
>> remote.<name>.fetch option to fetch to refs/quarantine then have the
>> post-fetch hook do the work”: it means if I run `git pull`, then:
>>  1. The remote references will be pulled to refs/quarantine/...
>>  2. The post-fetch hook will copy the accepted ones to refs/remotes/...
>>  3. The `git merge FETCH_HEAD` called by pull will merge FETCH_HEAD into
>> local branches... and so merge from refs/quarantine.
> 
> Good point. You can't munge FETCH_HEAD by playing with refspecs.
> 
> I am starting to come around to the idea that "pre-fetch" might be the
> best way to do what you want. Not to rewrite refs, but perhaps to simply
> reject them. In the same way that we allow pre-receive to reject pushed
> refs (both are, after all, the final check on admitting new history into
> the repository, just in opposite directions).
> 
>> So, when thinking about it, I'm back to thinking the proper hook
>> interface should be the one of the tweak-fetch hook, but its
>> implementation should make it not go crazy on remote servers. And so
>> that the implementation should do all this refs/quarantine wizardry
>> inside git itself.
> 
> So does anybody actually want to be able to adjust the refs as they pass
> through? It really sounds like you just want to be able to reject or not
> reject the fetch. And that rejecting would be the uncommon case, so it's
> OK to just abort the whole operation and expect the user to figure it
> out.

This completely fits my use case (modulo the fact that it's more similar
to the `update` hook than to `pre-receive` I think, as verifying the
signature requires the objects to already have been downloaded), indeed,
though I'm not sure it would have fit Joey's (based on my understanding,
adding a merge was what was originally asked for).

Actually, I'm wondering if the existing semantics of `update` could not
be reused for the `pre-fetch`. Would it make sense to just call `update`
during a fetch in the same way as during a receive-pack? That said it
likely makes this a breaking change, though it's maybe unlikely that a
repository is used both for fetching and for receive-pack'ing, it could
happen.

So the proposal as I understand it would currently be adding a
`fetch-update` hook that does exactly the same thing as `update` but for
`fetch`. This solves my use case in a nice way, though it likely doesn't
solve Joey's original one (which has been taken care of by a wrapper
since then).

What do you all think about it?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-14  2:02                             ` Fetch-hooks Leo Gaspard
@ 2018-02-19 21:23                               ` Jeff King
  2018-02-19 22:50                                 ` Fetch-hooks Leo Gaspard
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2018-02-19 21:23 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On Wed, Feb 14, 2018 at 03:02:00AM +0100, Leo Gaspard wrote:

> > So does anybody actually want to be able to adjust the refs as they pass
> > through? It really sounds like you just want to be able to reject or not
> > reject the fetch. And that rejecting would be the uncommon case, so it's
> > OK to just abort the whole operation and expect the user to figure it
> > out.
> 
> This completely fits my use case (modulo the fact that it's more similar
> to the `update` hook than to `pre-receive` I think, as verifying the
> signature requires the objects to already have been downloaded), indeed,
> though I'm not sure it would have fit Joey's (based on my understanding,
> adding a merge was what was originally asked for).
> 
> Actually, I'm wondering if the existing semantics of `update` could not
> be reused for the `pre-fetch`. Would it make sense to just call `update`
> during a fetch in the same way as during a receive-pack? That said it
> likely makes this a breaking change, though it's maybe unlikely that a
> repository is used both for fetching and for receive-pack'ing, it could
> happen.
> 
> So the proposal as I understand it would currently be adding a
> `fetch-update` hook that does exactly the same thing as `update` but for
> `fetch`. This solves my use case in a nice way, though it likely doesn't
> solve Joey's original one (which has been taken care of by a wrapper
> since then).
> 
> What do you all think about it?

I think it should be a separate hook; having an existing hook trigger in
new places is likely to cause confusion and regressions.

If you do go this route, please model it after "pre-receive" rather than
"update". We had "update" originally but found it was too limiting for
hooks to see only one ref at a time. So we introduced pre-receive. The
"update" hook remains for historical reasons, but I don't think we'd
want to reproduce the mistake. :)

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-19 21:23                               ` Fetch-hooks Jeff King
@ 2018-02-19 22:50                                 ` Leo Gaspard
  2018-02-20  6:10                                   ` Fetch-hooks Jacob Keller
  2018-02-20  7:42                                   ` Fetch-hooks Jeff King
  0 siblings, 2 replies; 38+ messages in thread
From: Leo Gaspard @ 2018-02-19 22:50 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On 02/19/2018 10:23 PM, Jeff King wrote:
> [...]
> If you do go this route, please model it after "pre-receive" rather than
> "update". We had "update" originally but found it was too limiting for
> hooks to see only one ref at a time. So we introduced pre-receive. The
> "update" hook remains for historical reasons, but I don't think we'd
> want to reproduce the mistake. :)

Hmm, what bothered me with “pre-receive” was that it was an
all-or-nothing decision, without the ability to allow some references
through and not others.

Is there a way for “pre-receive” to individually filter hooks? I was
under the impression that the only way to do that was to use the
“update” hook, which was the reason I wanted to model it after “update”
rather than “pre-receive” (my use case being a check independent for
each pushed ref)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-19 22:50                                 ` Fetch-hooks Leo Gaspard
@ 2018-02-20  6:10                                   ` Jacob Keller
  2018-02-20  7:42                                   ` Fetch-hooks Jeff King
  1 sibling, 0 replies; 38+ messages in thread
From: Jacob Keller @ 2018-02-20  6:10 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Joey Hess,
	Git mailing list, Brandon Williams, Junio C Hamano

On Mon, Feb 19, 2018 at 2:50 PM, Leo Gaspard <leo@gaspard.io> wrote:
> On 02/19/2018 10:23 PM, Jeff King wrote:
>> [...]
>> If you do go this route, please model it after "pre-receive" rather than
>> "update". We had "update" originally but found it was too limiting for
>> hooks to see only one ref at a time. So we introduced pre-receive. The
>> "update" hook remains for historical reasons, but I don't think we'd
>> want to reproduce the mistake. :)
>
> Hmm, what bothered me with “pre-receive” was that it was an
> all-or-nothing decision, without the ability to allow some references
> through and not others.
>
> Is there a way for “pre-receive” to individually filter hooks? I was
> under the impression that the only way to do that was to use the
> “update” hook, which was the reason I wanted to model it after “update”
> rather than “pre-receive” (my use case being a check independent for
> each pushed ref)

At least in the "push" case I think there is value in saying "if one
ref fails, please fail the entire push, always". This makes sense,
because if a user happens to violate some rule in one thing, they very
likely may not wish any of the push to succeed, and it also creates
problems with whether a push is atomic or not.

For fetch, I could see either method being useful.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-19 22:50                                 ` Fetch-hooks Leo Gaspard
  2018-02-20  6:10                                   ` Fetch-hooks Jacob Keller
@ 2018-02-20  7:42                                   ` Jeff King
  2018-02-20 21:19                                     ` Fetch-hooks Leo Gaspard
  1 sibling, 1 reply; 38+ messages in thread
From: Jeff King @ 2018-02-20  7:42 UTC (permalink / raw)
  To: Leo Gaspard
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On Mon, Feb 19, 2018 at 11:50:37PM +0100, Leo Gaspard wrote:

> On 02/19/2018 10:23 PM, Jeff King wrote:
> > [...]
> > If you do go this route, please model it after "pre-receive" rather than
> > "update". We had "update" originally but found it was too limiting for
> > hooks to see only one ref at a time. So we introduced pre-receive. The
> > "update" hook remains for historical reasons, but I don't think we'd
> > want to reproduce the mistake. :)
> 
> Hmm, what bothered me with “pre-receive” was that it was an
> all-or-nothing decision, without the ability to allow some references
> through and not others.
> 
> Is there a way for “pre-receive” to individually filter hooks? I was
> under the impression that the only way to do that was to use the
> “update” hook, which was the reason I wanted to model it after “update”
> rather than “pre-receive” (my use case being a check independent for
> each pushed ref)

No, pre-receive is always atomic. However, that may actually be what you
want, if the point is to keep untrusted data out of the repository. By
rejecting the whole thing, we could in theory keep the objects from
entering the repository at all. This is how the push side works via the
"quarantine" system (which is explained in git-receive-pack(1)).
Fetching doesn't currently quarantine objects, but it probably wouldn't
be very hard to make it do so.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Fetch-hooks
  2018-02-20  7:42                                   ` Fetch-hooks Jeff King
@ 2018-02-20 21:19                                     ` Leo Gaspard
  0 siblings, 0 replies; 38+ messages in thread
From: Leo Gaspard @ 2018-02-20 21:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Joey Hess, git,
	Brandon Williams, Junio C Hamano

On 02/20/2018 08:42 AM, Jeff King wrote:>> [...]
>>
>> Is there a way for “pre-receive” to individually filter [refs]? I was
>> under the impression that the only way to do that was to use the
>> “update” hook, which was the reason I wanted to model it after “update”
>> rather than “pre-receive” (my use case being a check independent for
>> each pushed ref)
> 
> No, pre-receive is always atomic. However, that may actually be what you
> want, if the point is to keep untrusted data out of the repository. By
> rejecting the whole thing, we could in theory keep the objects from
> entering the repository at all. This is how the push side works via the
> "quarantine" system (which is explained in git-receive-pack(1)).
> Fetching doesn't currently quarantine objects, but it probably wouldn't
> be very hard to make it do so.

Oh, I didn't think about quarantining behavior, indeed!

So I guess, following your answer as well as Jake's, I'll try to
implement a pre-receive-like hook, and will come back to this list when
I'll have a tentative implementation. Thanks for the advice! :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, back to index

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-07 21:56 Fetch-hooks Leo Gaspard
2018-02-07 22:51 ` Fetch-hooks Ævar Arnfjörð Bjarmason
2018-02-08  0:06   ` Fetch-hooks Leo Gaspard
2018-02-08 15:30     ` Fetch-hooks Joey Hess
2018-02-08 17:02       ` Fetch-hooks Leo Gaspard
2018-02-08 21:06         ` Fetch-hooks Ævar Arnfjörð Bjarmason
2018-02-08 22:18           ` Fetch-hooks Leo Gaspard
2018-02-09 22:04             ` Fetch-hooks Ævar Arnfjörð Bjarmason
2018-02-09 22:24               ` Fetch-hooks Leo Gaspard
2018-02-09 22:56                 ` Fetch-hooks Ævar Arnfjörð Bjarmason
2018-02-09 22:30               ` Fetch-hooks Jeff King
2018-02-09 22:45                 ` Fetch-hooks Junio C Hamano
2018-02-09 23:49                 ` Fetch-hooks Leo Gaspard
2018-02-10  0:13                   ` Fetch-hooks Jeff King
2018-02-10  0:37                     ` Fetch-hooks Leo Gaspard
2018-02-10  1:08                       ` Fetch-hooks Junio C Hamano
2018-02-10  1:33                         ` Fetch-hooks Leo Gaspard
2018-02-10 18:03                           ` Fetch-hooks Leo Gaspard
2018-02-10 12:21                       ` Fetch-hooks Jeff King
2018-02-10 18:36                         ` Fetch-hooks Leo Gaspard
2018-02-12 19:23                           ` Fetch-hooks Brandon Williams
2018-02-13 15:44                             ` Fetch-hooks Leo Gaspard
2018-02-14  1:38                             ` Fetch-hooks Jeff King
2018-02-14  1:35                           ` Fetch-hooks Jeff King
2018-02-14  2:02                             ` Fetch-hooks Leo Gaspard
2018-02-19 21:23                               ` Fetch-hooks Jeff King
2018-02-19 22:50                                 ` Fetch-hooks Leo Gaspard
2018-02-20  6:10                                   ` Fetch-hooks Jacob Keller
2018-02-20  7:42                                   ` Fetch-hooks Jeff King
2018-02-20 21:19                                     ` Fetch-hooks Leo Gaspard
2018-02-14  1:46                         ` Fetch-hooks Jacob Keller
2018-02-09 19:12         ` Fetch-hooks Leo Gaspard
2018-02-09 20:20           ` Fetch-hooks Joey Hess
2018-02-09 21:28             ` [PATCH 0/2] fetch: add tweak-fetch hook Leo Gaspard
2018-02-09 21:44               ` [PATCH 1/2] fetch: preparations for " Leo Gaspard
2018-02-09 21:44                 ` [PATCH 2/2] fetch: add " Leo Gaspard
2018-02-09 22:40                   ` Junio C Hamano
2018-02-09 22:34                 ` [PATCH 1/2] fetch: preparations for " Junio C Hamano

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox