git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
@ 2015-10-26  8:09 Lukas Fleischer
  2015-10-26 19:58 ` Junio C Hamano
  2015-10-28 15:42 ` [PATCH] Allow hideRefs to match " Lukas Fleischer
  0 siblings, 2 replies; 17+ messages in thread
From: Lukas Fleischer @ 2015-10-26  8:09 UTC (permalink / raw
  To: git

Right now, we always advertise all refs as ".have", even those outside
the current namespace. This leads to problems when trying to push to a
repository with a huge number of namespaces from a slow connection.

Add a configuration option receive.advertiseAllRefs that can be used to
determine whether refs outside the current namespace should be
advertised or not.

Signed-off-by: Lukas Fleischer <lfleischer@lfos.de>
---
We are using Git namespaces to store a huge number of (virtual)
repositories inside a shared repository. While the blobs in the virtual
repositories are fairly similar, they do not share any refs, so
advertising any refs outside the current namespace is undesirable. See
the discussion on [1] for details.

Note that this patch is just a draft: I didn't do any testing, apart
from checking that it compiles. I would like to hear some opinions
before sending a polished version.

Is our use case considered common enough to justify the inclusion of
such a configuration option in mainline?

Are there suggestions for a better name for the option? Ideally, it
should contain the word "namespace" but I could not come up with
something sensible that is short enough.

[1] https://lists.archlinux.org/pipermail/aur-general/2015-October/031596.html

 Documentation/config.txt |  6 ++++++
 builtin/receive-pack.c   | 31 +++++++++++++++++++++----------
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 315f271..aa101a7 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2201,6 +2201,12 @@ receive.advertiseAtomic::
 	capability to its clients. If you don't want to this capability
 	to be advertised, set this variable to false.
 
+receive.advertiseAllRefs::
+	By default, git-receive-pack will advertise all refs, even those
+	outside the current namespace, so that the client can use them to
+	minimize data transfer. If you only want to advertise refs from the
+	active namespace to be advertised, set this variable to false.
+
 receive.autogc::
 	By default, git-receive-pack will run "git-gc --auto" after
 	receiving data from git-push and updating refs.  You can stop
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e6b93d0..ea9a820 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -41,6 +41,7 @@ static struct strbuf fsck_msg_types = STRBUF_INIT;
 static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int advertise_atomic_push = 1;
+static int advertise_all_refs = 1;
 static int unpack_limit = 100;
 static int report_status;
 static int use_sideband;
@@ -190,6 +191,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.advertiseallrefs") == 0) {
+		advertise_all_refs = git_config_bool(var, value);
+		return 0;
+	}
+
 	return git_default_config(var, value, cb);
 }
 
@@ -222,16 +228,21 @@ static void show_ref(const char *path, const unsigned char *sha1)
 static int show_ref_cb(const char *path, const struct object_id *oid, int flag, void *unused)
 {
 	path = strip_namespace(path);
-	/*
-	 * Advertise refs outside our current namespace as ".have"
-	 * refs, so that the client can use them to minimize data
-	 * transfer but will otherwise ignore them. This happens to
-	 * cover ".have" that are thrown in by add_one_alternate_ref()
-	 * to mark histories that are complete in our alternates as
-	 * well.
-	 */
-	if (!path)
-		path = ".have";
+	if (!path) {
+		if (advertise_all_refs) {
+			/*
+			 * Advertise refs outside our current namespace as
+			 * ".have" refs, so that the client can use them to
+			 * minimize data transfer but will otherwise ignore
+			 * them. This happens to cover ".have" that are thrown
+			 * in by add_one_alternate_ref() to mark histories that
+			 * are complete in our alternates as well.
+			 */
+			path = ".have";
+		} else {
+			return 0;
+		}
+	}
 	show_ref(path, oid->hash);
 	return 0;
 }
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
  2015-10-26  8:09 [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace Lukas Fleischer
@ 2015-10-26 19:58 ` Junio C Hamano
       [not found]   ` <20151027053916.3030.8259@typhoon.lan>
  2015-10-28 15:42 ` [PATCH] Allow hideRefs to match " Lukas Fleischer
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-26 19:58 UTC (permalink / raw
  To: Lukas Fleischer; +Cc: git

Is there a reason why transfer.hiderefs is not sufficient?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
       [not found]     ` <20151027055911.4877.94179@typhoon.lan>
@ 2015-10-27 14:32       ` Lukas Fleischer
  2015-10-27 18:18         ` Junio C Hamano
  2015-10-30 21:31         ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Lukas Fleischer @ 2015-10-27 14:32 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On Tue, 27 Oct 2015 at 06:59:11, Lukas Fleischer wrote:
> [...]
> On second thought, it might be possible to overwrite the value of
> transfer.hiderefs using the -c command line option. If we combine that
> with the negative patterns supported by hiderefs, we might get a
> solution that is clean and that avoids race conditions. I will check
> whether that works with git-http-backend as well and will report back.
> 
> Thanks for the pointer!

Using receive.hideRefs seems to work but there are two minor issues:

1. There does not seem to be a way to pass configuration parameters to
   git-shell commands. Right now, the only way to work around this seems
   to write a wrapper script around git-shell that catches
   git-receive-pack commands and executes something like
   
       git -c receive.hideRefs=[...] receive-pack [...]
   
   instead of forwarding those commands to git-shell. How about allowing
   to overwrite configuration parameters via an environment variable?
   Has that been discussed before?

2. transfer.hideRefs and receive.hideRefs do not seem to work with Git
   namespaces in general. show_ref_cb() replaces each ref outside the
   current namespace with ".have" before passing it to show_ref() which
   in turn performs the ref_is_hidden() check. This has the nice side
   effect that receive.hideRefs=.have does exactly what I want, however
   it also means that hideRefs feature does not allow for excluding only
   specific tags outside the current namespace. Is that intended? Can we
   rely on Git always looking for ".have" in the hideRefs list in this
   case? Should the documentation be updated?

Regards,
Lukas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
  2015-10-27 14:32       ` Lukas Fleischer
@ 2015-10-27 18:18         ` Junio C Hamano
  2015-10-28  7:00           ` Lukas Fleischer
  2015-10-30 21:31         ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-27 18:18 UTC (permalink / raw
  To: Lukas Fleischer; +Cc: git

Lukas Fleischer <lfleischer@lfos.de> writes:

> 2. transfer.hideRefs and receive.hideRefs do not seem to work with Git
>    namespaces in general. show_ref_cb() replaces each ref outside the
>    current namespace with ".have" before passing it to show_ref() which
>    in turn performs the ref_is_hidden() check. This has the nice side
>    effect that receive.hideRefs=.have does exactly what I want, however
>    it also means that hideRefs feature does not allow for excluding only
>    specific tags outside the current namespace. Is that intended? Can we
>    rely on Git always looking for ".have" in the hideRefs list in this
>    case?

When I asked 'Is transfer.hiderefs insufficient?', I wasn't
expecting it to be usable out of box.  It was a suggestion to build
on top of it, instead of adding a parallel support for something
specific to namespaces.

For example, if the problem is that you cannot tell ref_is_hidden()
what namespace the ref is from because it is called after running
strip_namespace(), perhaps you can find a way to have the original
"namespaced ref" specified on transfer.hiderefs and match them?
Then in repository for project A, namespaced refs for project B can
be excluded by specifying refs/namespaces/B/* on transfer.hiderefs.

Perhaps along the lines of this?

 builtin/receive-pack.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index bcb624b..db0a99d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -221,6 +221,15 @@ static void show_ref(const char *path, const unsigned char *sha1)
 
 static int show_ref_cb(const char *path, const struct object_id *oid, int flag, void *unused)
 {
+	const char *ns = get_git_namespace();
+
+	/*
+	 * Give the "hiderefs" mechanism a chance to inspect and
+	 * reject the namespaced ref itself.
+	 */
+	if (ns[0] && ref_is_hidden(path))
+		return 0;
+
 	path = strip_namespace(path);
 	/*
 	 * Advertise refs outside our current namespace as ".have"

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
  2015-10-27 18:18         ` Junio C Hamano
@ 2015-10-28  7:00           ` Lukas Fleischer
  2015-10-28 13:42             ` Jeff King
  2015-10-28 15:48             ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Lukas Fleischer @ 2015-10-28  7:00 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On Tue, 27 Oct 2015 at 19:18:26, Junio C Hamano wrote:
> [...]
> When I asked 'Is transfer.hiderefs insufficient?', I wasn't
> expecting it to be usable out of box.  It was a suggestion to build
> on top of it, instead of adding a parallel support for something
> specific to namespaces.
> 

Agreed, and I do have a couple of patches to improve hideRefs. I still
have some questions before submitting them, though. See below.

> For example, if the problem is that you cannot tell ref_is_hidden()
> what namespace the ref is from because it is called after running
> strip_namespace(), perhaps you can find a way to have the original
> "namespaced ref" specified on transfer.hiderefs and match them?
> Then in repository for project A, namespaced refs for project B can
> be excluded by specifying refs/namespaces/B/* on transfer.hiderefs.
> 
> Perhaps along the lines of this?
> [...]

My original question remains: Do we want to continue supporting things
like transfer.hideRefs=.have (which currently magically hides all refs
outside the current namespace)? For 100% backwards compatibility, we
would have to. On the other hand, one could consider the current
behavior a bug and one could argue that it is weird enough that probably
nobody (apart from me) relies on it right now. If we decide to keep it
anyway, I think it should be documented.

Another patch I have in my patch queue adds support for a whitelist mode
to hideRefs. There are several ways to implement that:

1. Make transfer.hideRefs='' hide all refs (it currently does not). The
   user can then whitelist refs explicitly using negative patterns
   below that rule. This is how my current implementation works. Using
   the empty string seemed most natural since hideRefs matches prefixes
   and every string has the empty string as a prefix. If that seems too
   weird, we could probably special case something like
   transfer.hideRefs='*' instead.

2. Detect whether hideRefs only contains negative patterns. Switch to
   whitelist mode ("hide by default") in that case.

3. Add another option to switch between "hide by default" and "show by
   default".

I personally prefer the first option. Any other opinions?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
  2015-10-28  7:00           ` Lukas Fleischer
@ 2015-10-28 13:42             ` Jeff King
  2015-10-28 15:48             ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2015-10-28 13:42 UTC (permalink / raw
  To: Lukas Fleischer; +Cc: Junio C Hamano, git

On Wed, Oct 28, 2015 at 08:00:45AM +0100, Lukas Fleischer wrote:

> My original question remains: Do we want to continue supporting things
> like transfer.hideRefs=.have (which currently magically hides all refs
> outside the current namespace)? For 100% backwards compatibility, we
> would have to. On the other hand, one could consider the current
> behavior a bug and one could argue that it is weird enough that probably
> nobody (apart from me) relies on it right now. If we decide to keep it
> anyway, I think it should be documented.

I don't think that hiding ".have" refs at that level is especially
useful. I do not use namespaces, but I do use alternates extensively,
and that is the original source of these ".have" refs. But filtering
them at the advertisement layer is very inefficient, as it is expensive
to get the list in the first place (we spawn ls-remote, which spawns
upload-pack in the alternate!). So we'd want to prevent that process
much earlier.

I have an unpublished patch to specially disable alternates
advertisement entirely (i.e., adding a new boolean config,
receive.advertiseAlternates). In my case, it is because the alternates
repositories have huge numbers of refs (sometimes ranging into the
gigabytes) and the performance hit on even loading that packed-refs file
is too large.

I suppose that behavior _could_ be triggered by ".have" appearing in the
hiderefs config, though (i.e., before accessing the alternate, check
ref_is_hidden(".have")). That seems a bit too subtle to me, though.

> Another patch I have in my patch queue adds support for a whitelist mode
> to hideRefs. There are several ways to implement that:
> 
> 1. Make transfer.hideRefs='' hide all refs (it currently does not). The
>    user can then whitelist refs explicitly using negative patterns
>    below that rule. This is how my current implementation works. Using
>    the empty string seemed most natural since hideRefs matches prefixes
>    and every string has the empty string as a prefix. If that seems too
>    weird, we could probably special case something like
>    transfer.hideRefs='*' instead.
> 
> 2. Detect whether hideRefs only contains negative patterns. Switch to
>    whitelist mode ("hide by default") in that case.
> 
> 3. Add another option to switch between "hide by default" and "show by
>    default".
> 
> I personally prefer the first option. Any other opinions?

I am just a bystander and would not use this myself, but I think the 1st
is the least ugly. I am not sure why ignoring "refs/" does not work,
though (it does not catch ".have", of course, but I think that is a
feature; there are a finite set of pseudo-refs, so you can ignore those,
too, if you want).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] Allow hideRefs to match refs outside the namespace
  2015-10-26  8:09 [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace Lukas Fleischer
  2015-10-26 19:58 ` Junio C Hamano
@ 2015-10-28 15:42 ` Lukas Fleischer
  2015-10-28 16:21   ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Lukas Fleischer @ 2015-10-28 15:42 UTC (permalink / raw
  To: git

Right now, refs with a path outside the current namespace are replaced
by ".have" before passing them to show_ref() which in turn checks
whether the ref matches the hideRefs pattern. Move the check before the
path substitution in show_ref_cb() such that the hideRefs feature can be
used to hide specific refs outside the current namespace.

Signed-off-by: Lukas Fleischer <lfleischer@lfos.de>
---
The other show_ref() call sites are in show_one_alternate_sha1() and in
write_head_info(). The call site in show_one_alternate_sha1() is for
alternates and passes ".have". The other one is

    show_ref("capabilities^{}", null_sha1);

and is not relevant to the hideRefs feature. Note that this kind of
breaks backwards compatibility since the "magic" hideRefs patterns
".have" and "capabilities^{}" no longer work, as explained in the
discussion.

 builtin/receive-pack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index bcb624b..4a5d0ae 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -195,9 +195,6 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 
 static void show_ref(const char *path, const unsigned char *sha1)
 {
-	if (ref_is_hidden(path))
-		return;
-
 	if (sent_capabilities) {
 		packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
 	} else {
@@ -221,6 +218,9 @@ static void show_ref(const char *path, const unsigned char *sha1)
 
 static int show_ref_cb(const char *path, const struct object_id *oid, int flag, void *unused)
 {
+	if (ref_is_hidden(path))
+		return 0;
+
 	path = strip_namespace(path);
 	/*
 	 * Advertise refs outside our current namespace as ".have"
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
  2015-10-28  7:00           ` Lukas Fleischer
  2015-10-28 13:42             ` Jeff King
@ 2015-10-28 15:48             ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2015-10-28 15:48 UTC (permalink / raw
  To: Lukas Fleischer; +Cc: git

Lukas Fleischer <lfleischer@lfos.de> writes:

> Another patch I have in my patch queue adds support for a whitelist mode
> to hideRefs. There are several ways to implement that:
>
> 1. Make transfer.hideRefs='' hide all refs (it currently does not). The

Hmph, that even sounds like a bug.  parse_hide_refs_config() does
not seem to reject ref[] whose length is zero, and ref_is_hidden()
would just check "starts_with(refname, match)" with an empty string
as "match", so I would naively have expected that to work already.

Ahh, there is "if refname[len] is at the end or slash boundary"
check after that.  You're right--you'd need to tweak that one for it
to work.

>    user can then whitelist refs explicitly using negative patterns
>    below that rule. This is how my current implementation works.

That sounds like a good way to go.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Allow hideRefs to match refs outside the namespace
  2015-10-28 15:42 ` [PATCH] Allow hideRefs to match " Lukas Fleischer
@ 2015-10-28 16:21   ` Junio C Hamano
  2015-10-31  8:49     ` Lukas Fleischer
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-28 16:21 UTC (permalink / raw
  To: Lukas Fleischer; +Cc: git

Lukas Fleischer <lfleischer@lfos.de> writes:

> Right now, refs with a path outside the current namespace are replaced
> by ".have" before passing them to show_ref() which in turn checks
> whether the ref matches the hideRefs pattern. Move the check before the
> path substitution in show_ref_cb() such that the hideRefs feature can be
> used to hide specific refs outside the current namespace.
>
> Signed-off-by: Lukas Fleischer <lfleischer@lfos.de>
> ---
> The other show_ref() call sites are in show_one_alternate_sha1() and in
> write_head_info(). The call site in show_one_alternate_sha1() is for
> alternates and passes ".have". The other one is
>
>     show_ref("capabilities^{}", null_sha1);
>
> and is not relevant to the hideRefs feature. Note that this kind of
> breaks backwards compatibility since the "magic" hideRefs patterns
> ".have" and "capabilities^{}" no longer work, as explained in the
> discussion.

If somebody is using namespaces and has "refs/frotz/" in the
hiderefs configuration, we hide refs/frotz/ no matter which
namespace is being accessed.  With this change, with the removal the
check from show_ref(), wouldn't such a repository suddenly see a
behaviour change?

>  builtin/receive-pack.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index bcb624b..4a5d0ae 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -195,9 +195,6 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
>  
>  static void show_ref(const char *path, const unsigned char *sha1)
>  {
> -	if (ref_is_hidden(path))
> -		return;
> -
>  	if (sent_capabilities) {
>  		packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
>  	} else {
> @@ -221,6 +218,9 @@ static void show_ref(const char *path, const unsigned char *sha1)
>  
>  static int show_ref_cb(const char *path, const struct object_id *oid, int flag, void *unused)
>  {
> +	if (ref_is_hidden(path))
> +		return 0;
> +
>  	path = strip_namespace(path);
>  	/*
>  	 * Advertise refs outside our current namespace as ".have"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
  2015-10-27 14:32       ` Lukas Fleischer
  2015-10-27 18:18         ` Junio C Hamano
@ 2015-10-30 21:31         ` Junio C Hamano
  2015-10-30 21:46           ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-30 21:31 UTC (permalink / raw
  To: Lukas Fleischer; +Cc: git, Jeff King

Lukas Fleischer <lfleischer@lfos.de> writes:

> 1. There does not seem to be a way to pass configuration parameters to
>    git-shell commands. Right now, the only way to work around this seems
>    to write a wrapper script around git-shell that catches
>    git-receive-pack commands and executes something like
>    
>        git -c receive.hideRefs=[...] receive-pack [...]
>    
>    instead of forwarding those commands to git-shell.

This part we have never discussed in the thread, I think.  Why do
you need to override, instead of having these in the repository's
config files?

Is it because a repository may host multiple pseudo repositories in
the form of "namespaces" but they must share the same config file,
and you would want to customize per "namespace"?

For that we may want to enhance the [include] mechanism.  Something
like

	[include "namespace=foo"]
        	path = /path/to/foo/specific/config.txt

	[include "namespace=bar"]
        	path = /path/to/bar/specific/config.txt

Cc'ing Peff as we have discussed this kind of conditional inclusion
in the past...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
  2015-10-30 21:31         ` Junio C Hamano
@ 2015-10-30 21:46           ` Jeff King
  2015-10-31  9:03             ` Lukas Fleischer
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2015-10-30 21:46 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Lukas Fleischer, git

On Fri, Oct 30, 2015 at 02:31:28PM -0700, Junio C Hamano wrote:

> Lukas Fleischer <lfleischer@lfos.de> writes:
> 
> > 1. There does not seem to be a way to pass configuration parameters to
> >    git-shell commands. Right now, the only way to work around this seems
> >    to write a wrapper script around git-shell that catches
> >    git-receive-pack commands and executes something like
> >    
> >        git -c receive.hideRefs=[...] receive-pack [...]
> >    
> >    instead of forwarding those commands to git-shell.
> 
> This part we have never discussed in the thread, I think.  Why do
> you need to override, instead of having these in the repository's
> config files?
> 
> Is it because a repository may host multiple pseudo repositories in
> the form of "namespaces" but they must share the same config file,
> and you would want to customize per "namespace"?
> 
> For that we may want to enhance the [include] mechanism.  Something
> like
> 
> 	[include "namespace=foo"]
>         	path = /path/to/foo/specific/config.txt
> 
> 	[include "namespace=bar"]
>         	path = /path/to/bar/specific/config.txt
> 
> Cc'ing Peff as we have discussed this kind of conditional inclusion
> in the past...

Yeah, that sort of conditional matching is exactly what I had intended
for the "subsection" of include to be. We just haven't come up with a
good condition to act as our first use case. :)

I am happy with any syntax that does not paint us into a corner (and
your example above looks fine, assuming we could later add other keys on
the left-hand of the "=").

I am slightly confused, though, where the namespace is set in such a
git-shell example. I have no really used ref namespaces myself, but my
understanding is that they have to come from the environment. You can
similarly set config through the environment. I don't think we've ever
publicized that, but it is how "git -c" works. E.g.:

  $ git -c alias.foo='!env' -c another.option=true foo | grep GIT_
  GIT_CONFIG_PARAMETERS='alias.foo='\!'env' 'another.option=true'

I think it is very particular that you single-quote each item, though:

  $ GIT_CONFIG_PARAMETERS=foo.bar=true git config foo.bar
  error: bogus format in GIT_CONFIG_PARAMETERS
  fatal: unable to parse command-line config

  $ GIT_CONFIG_PARAMETERS="'foo.bar=true'" git config foo.bar
  true

So we may want to make it a little more friendly before truly
recommending it as an interface, but I don't think there is any
conceptual problem with doing so.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Allow hideRefs to match refs outside the namespace
  2015-10-28 16:21   ` Junio C Hamano
@ 2015-10-31  8:49     ` Lukas Fleischer
  2015-10-31 17:31       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Lukas Fleischer @ 2015-10-31  8:49 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

I wrote this email on Thursday but it seems like it did not make it
through the mailing list. Resubmitting...

On Wed, 28 Oct 2015 at 17:21:59, Junio C Hamano wrote:
> Lukas Fleischer <lfleischer@lfos.de> writes:
> 
> > Right now, refs with a path outside the current namespace are replaced
> > by ".have" before passing them to show_ref() which in turn checks
> > whether the ref matches the hideRefs pattern. Move the check before the
> > path substitution in show_ref_cb() such that the hideRefs feature can be
> > used to hide specific refs outside the current namespace.
> >
> > Signed-off-by: Lukas Fleischer <lfleischer@lfos.de>
> > ---
> > The other show_ref() call sites are in show_one_alternate_sha1() and in
> > write_head_info(). The call site in show_one_alternate_sha1() is for
> > alternates and passes ".have". The other one is
> >
> >     show_ref("capabilities^{}", null_sha1);
> >
> > and is not relevant to the hideRefs feature. Note that this kind of
> > breaks backwards compatibility since the "magic" hideRefs patterns
> > ".have" and "capabilities^{}" no longer work, as explained in the
> > discussion.
> 
> If somebody is using namespaces and has "refs/frotz/" in the
> hiderefs configuration, we hide refs/frotz/ no matter which
> namespace is being accessed.  With this change, with the removal the
> check from show_ref(), wouldn't such a repository suddenly see a
> behaviour change?
> [...]

It would indeed. However, we cannot stay 100% backwards compatible when
adding support for matching refs outside the current namespace without
introducing new syntax. For example, if Git namespaces are in use (i.e.
GIT_NAMESPACE is set), "refs/namespaces/foo/refs/bar" in hideRefs would
not have hidden refs/namespaces/foo/refs/bar before the change but it
does afterwards. You might argue that nobody would have added
"refs/namespaces/foo/refs/bar" to hideRefs in the first place but
namespaces can be nested and it might be that the user meant to hide
refs/namespaces/bar/refs/namespaces/foo/refs/bar instead. Yes, those are
weird corner cases. But then again, I think that using hideRefs with
namespaces already is a corner case as well. I also think that using the
same syntax to match both original and stripped refs is bad design. It
makes things complicated and the resulting feature doesn't have the full
expressive power of the simpler version only matching original refs.

So, we can either intentionally break backwards compatibility for some
rare corner cases, or keep the current behavior and introduce some new
syntax for matching the original (unstripped) refs. For the latter, we
could either introduce a new option ("hideUnstrippedRefs"?) or special
syntax inside hideRefs ("/refs/foo" instead of "refs/foo" for matching
the unstripped version only?) What do you think?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace
  2015-10-30 21:46           ` Jeff King
@ 2015-10-31  9:03             ` Lukas Fleischer
  0 siblings, 0 replies; 17+ messages in thread
From: Lukas Fleischer @ 2015-10-31  9:03 UTC (permalink / raw
  To: Jeff King, Junio C Hamano; +Cc: git

On Fri, 30 Oct 2015 at 22:46:19, Jeff King wrote:
> On Fri, Oct 30, 2015 at 02:31:28PM -0700, Junio C Hamano wrote:
> 
> > Lukas Fleischer <lfleischer@lfos.de> writes:
> > 
> > > 1. There does not seem to be a way to pass configuration parameters to
> > >    git-shell commands. Right now, the only way to work around this seems
> > >    to write a wrapper script around git-shell that catches
> > >    git-receive-pack commands and executes something like
> > >    
> > >        git -c receive.hideRefs=[...] receive-pack [...]
> > >    
> > >    instead of forwarding those commands to git-shell.
> > 
> > This part we have never discussed in the thread, I think.  Why do
> > you need to override, instead of having these in the repository's
> > config files?
> > 
> > Is it because a repository may host multiple pseudo repositories in
> > the form of "namespaces" but they must share the same config file,
> > and you would want to customize per "namespace"?
> > 

Yes. As I said in the original thread, I want to set receive.hideRefs to
hide everything outside the current namespace, i.e. something equivalent
to

    git -c receive.hideRefs='refs/' -c receive.hideRefs="!refs/namespaces/$foo" receive-pack /some/path

if receive.hideRefs would work with absolute (unstripped) namespaces.

> > For that we may want to enhance the [include] mechanism.  Something
> > like
> > 
> >       [include "namespace=foo"]
> >               path = /path/to/foo/specific/config.txt
> > 
> >       [include "namespace=bar"]
> >               path = /path/to/bar/specific/config.txt
> > 
> > Cc'ing Peff as we have discussed this kind of conditional inclusion
> > in the past...
> 

That would work but it would still be very cumbersome. Imagine that
there is a single repository with 100000 pseudo repositories inside. You
really don't want to create a config file and a indirection in the main
configuration for each of these pseudo repositories, just to build a
configuration equivalent to the single line I described above.

> [...]
> I am slightly confused, though, where the namespace is set in such a
> git-shell example. I have no really used ref namespaces myself, but my
> understanding is that they have to come from the environment. You can
> similarly set config through the environment. I don't think we've ever
> publicized that, but it is how "git -c" works. E.g.:
> 
>   $ git -c alias.foo='!env' -c another.option=true foo | grep GIT_
>   GIT_CONFIG_PARAMETERS='alias.foo='\!'env' 'another.option=true'
> [...]

Yes, the Git namespace is passed through the environment by setting
GIT_NAMESPACE and GIT_CONFIG_PARAMETERS is exactly what I was looking
for! Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Allow hideRefs to match refs outside the namespace
  2015-10-31  8:49     ` Lukas Fleischer
@ 2015-10-31 17:31       ` Junio C Hamano
  2015-10-31 23:40         ` Lukas Fleischer
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-31 17:31 UTC (permalink / raw
  To: Lukas Fleischer; +Cc: git

Lukas Fleischer <lfleischer@lfos.de> writes:

>> If somebody is using namespaces and has "refs/frotz/" in the
>> hiderefs configuration, we hide refs/frotz/ no matter which
>> namespace is being accessed.  With this change, with the removal the
>> check from show_ref(), wouldn't such a repository suddenly see a
>> behaviour change?
>> [...]
>
> It would indeed. However, we cannot stay 100% backwards compatible when
> adding support for matching refs outside the current namespace without
> introducing new syntax. For example, if Git namespaces are in use (i.e.
> GIT_NAMESPACE is set), "refs/namespaces/foo/refs/bar" in hideRefs would
> not have hidden refs/namespaces/foo/refs/bar before the change but it
> does afterwards. You might argue that nobody would have added
> "refs/namespaces/foo/refs/bar" to hideRefs in the first place...

I won't.  To the current users, when they say they want to exclude
"refs/foo", they mean they do not want to advertise the fact that a
ref "refs/foo/*" exists in their repository (either the whole thing
if that is how it is accessed, or in the namespace being
accessed). and you can replace "foo" with any string, including the
ones that contain "/namespaces/", i.e. the user wanted to exclude
refs from nested ones.

I suspect what you wrote in the above is being a bit too defeatist,
though.  We only need to prevent regressions to user with existing
and valid configurations.

You earlier (re)discovered a good approach to introduce a new
feature without breaking settings of existing users when we
discussed a "whitelist".  Since setting the configuration to an
empty string did not do anything in the old code, an empty string
was an invalid and non-working setting.  By taking advantage of that
fact, you safely can say "if you start with an empty that would
match everything, we'll treat all the others differently from the
way we did before" if you wanted to.  I think you can follow the
same principle here.  For example, I can imagine that the rule for
the "ref-is-hidden" can be updated to:

 * it now takes refname and also the fullname before stripping the
   namespace;

 * hide patterns that is prefixed with '!' means negative, just as
   before;

 * (after possibly '!' is stripped), hide patterns that is prefixed
   with '^', which was invalid before, means check the fullname with
   namespace prefix, which is a new rule;

 * otherwise, check the refname after stripping the namespace.

Such an update would allow a new feature "we now allow you to write
a pattern that determines the match before stripping the namespace
prefix" without breaking the existing repositories, no?

After looking at the current code, I have to say that the way
ref-is-hidden and show_ref_cb() interact with each other is not very
well designed when namespaces are in use.  I suspect that this is
because the "namespace" stuff was bolted on to the system without
thinking things through.  For example, people may want to hide
refs/changes/* and with the current code, refs/changes/* from your
own namespace will be filtered out, but the corresponding hierarchy
from other namespaces will be included after getting turned into
".have".  And that cannot be a useful behaviour.  Tips of
refs/changes/* would be closely related to the corresponding
branches, which means that it would help reducing the object
transfer if they are included, and the fact that the user hides them
is that the user values it more to reduce the size of the initial
ref advertisement more than the potential reduction of the object
transfer cost.  If other pseudo repositories (aka namespaces) are
projects totally unrelated to yours, inluding their refs/changes/*
(or any of their refs, for that matter) would not help the later
object transfer cost, and including them in the initial ref
advertisement would not achieve anything.  Even if other namespaces
are projects that are closely related to yours, if you are excluding
refs/changes/* from your own, that is a strong sign that you do not
want their refs/changes/*, either.

Assuming other namespaces are forks of the same project as yours
(and otherwise the repository management strategy needs to be
rethought--using namespace for them is not gaining anything other
than making your repack more costly), it is likely that all of them
share a lot of refs that point at the same object (think "tags").
Do we end up sending a lot of ".have" for exactly the same object
number of times?  Even though we cannot dedup show_ref() lines that
talk about concrete refs (because they talk about what refs exist at
which value, and the sending side would use them to locally reject
non-ff pushes, for example), ".have" lines that talk about the same
object can be safely deduped.  This is not directly related to your
topic of "what should be included in the advertisement", but a
potentially good thing to fix, if it indeed turns out that we are
sending a lot of duplicate ".have"s.  A fix in that would make
things better for everybody (not just namespace users, but those who
show the ".have" lines from the refs in the repository we borrow
objects from).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Allow hideRefs to match refs outside the namespace
  2015-10-31 17:31       ` Junio C Hamano
@ 2015-10-31 23:40         ` Lukas Fleischer
  2015-11-01 11:27           ` Lukas Fleischer
  0 siblings, 1 reply; 17+ messages in thread
From: Lukas Fleischer @ 2015-10-31 23:40 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On Sat, 31 Oct 2015 at 18:31:23, Junio C Hamano wrote:
> [...]
> You earlier (re)discovered a good approach to introduce a new
> feature without breaking settings of existing users when we
> discussed a "whitelist".  Since setting the configuration to an
> empty string did not do anything in the old code, an empty string
> was an invalid and non-working setting.  By taking advantage of that
> fact, you safely can say "if you start with an empty that would
> match everything, we'll treat all the others differently from the
> way we did before" if you wanted to.  I think you can follow the
> same principle here.  For example, I can imagine that the rule for
> the "ref-is-hidden" can be updated to:
> 
>  * it now takes refname and also the fullname before stripping the
>    namespace;
> 
>  * hide patterns that is prefixed with '!' means negative, just as
>    before;
> 
>  * (after possibly '!' is stripped), hide patterns that is prefixed
>    with '^', which was invalid before, means check the fullname with
>    namespace prefix, which is a new rule;
> 
>  * otherwise, check the refname after stripping the namespace.
> 
> Such an update would allow a new feature "we now allow you to write
> a pattern that determines the match before stripping the namespace
> prefix" without breaking the existing repositories, no?
> 

Yes. If I understood you correctly, this is exactly what I suggested in
the last paragraph of my previous email (the only difference being that
I suggested to use "/" as full name indicator instead of "^" but that is
just an implementation detail). I will look into implementing this if
that is the way we want to go.

> [...]
> Assuming other namespaces are forks of the same project as yours
> (and otherwise the repository management strategy needs to be
> rethought--using namespace for them is not gaining anything other
> than making your repack more costly), it is likely that all of them
> share a lot of refs that point at the same object (think "tags").
> Do we end up sending a lot of ".have" for exactly the same object
> number of times?  Even though we cannot dedup show_ref() lines that
> talk about concrete refs (because they talk about what refs exist at
> which value, and the sending side would use them to locally reject
> non-ff pushes, for example), ".have" lines that talk about the same
> object can be safely deduped.  This is not directly related to your
> topic of "what should be included in the advertisement", but a
> potentially good thing to fix, if it indeed turns out that we are
> sending a lot of duplicate ".have"s.  A fix in that would make
> things better for everybody (not just namespace users, but those who
> show the ".have" lines from the refs in the repository we borrow
> objects from).

Yes, I think we currently send a lot of duplicate lines. Would be nice
to have that fixed as well.

Note that we do use Git namespaces to store a lot of different but
similar pseudo repositories (i.e. they do not share any history but the
objects have huge similarities). Even though the pseudo repositories
itself are tiny, having the objects in a shared object storage reduces
the size significantly. Other people probably use separate repositories,
combined with something like GIT_OBJECT_DIRECTORY and preciousObjects
for that. Using Git namespaces, however, allows to run `git gc`/`git
repack` without needing to take care of maintaining back references to
the pseudo repositories and, more importantly, allows for storing all
the refs in a single "packed-refs" file which did reduce the size the
size by another factor of 10 in our tests. That massive difference in
size is probably mostly due to the fact that the actual content of each
repository is just some 100 bytes. Not sure if saving that much space
can currently be achieved with any other approach.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Allow hideRefs to match refs outside the namespace
  2015-10-31 23:40         ` Lukas Fleischer
@ 2015-11-01 11:27           ` Lukas Fleischer
  2015-11-01 18:18             ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Lukas Fleischer @ 2015-11-01 11:27 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On Sun, 01 Nov 2015 at 00:40:39, Lukas Fleischer wrote:
> On Sat, 31 Oct 2015 at 18:31:23, Junio C Hamano wrote:
> > [...]
> > You earlier (re)discovered a good approach to introduce a new
> > feature without breaking settings of existing users when we
> > discussed a "whitelist".  Since setting the configuration to an
> > empty string did not do anything in the old code, an empty string
> > was an invalid and non-working setting.  By taking advantage of that
> > fact, you safely can say "if you start with an empty that would
> > match everything, we'll treat all the others differently from the
> > way we did before" if you wanted to.  I think you can follow the
> > same principle here.  For example, I can imagine that the rule for
> > the "ref-is-hidden" can be updated to:
> > 
> >  * it now takes refname and also the fullname before stripping the
> >    namespace;
> > 
> >  * hide patterns that is prefixed with '!' means negative, just as
> >    before;
> > 
> >  * (after possibly '!' is stripped), hide patterns that is prefixed
> >    with '^', which was invalid before, means check the fullname with
> >    namespace prefix, which is a new rule;
> > 
> >  * otherwise, check the refname after stripping the namespace.
> > 
> > Such an update would allow a new feature "we now allow you to write
> > a pattern that determines the match before stripping the namespace
> > prefix" without breaking the existing repositories, no?
> > 
> 
> Yes. If I understood you correctly, this is exactly what I suggested in
> the last paragraph of my previous email (the only difference being that
> I suggested to use "/" as full name indicator instead of "^" but that is
> just an implementation detail). I will look into implementing this if
> that is the way we want to go.
> [...]

There are two more things I noticed.

Firstly, while looking for other callers of ref_is_hidden(), I realized
that send_ref() in upload-pack.c contains these lines of code:

    const char *refname_nons = strip_namespace(refname);                    
    struct object_id peeled;                                                
                                                                            
    if (mark_our_ref(refname, oid))                                         
            return 0;                                                       

where mark_our_ref() performs the ref_is_hidden() check on its first
parameter. So, in contrast to receive-pack, we already match the
original full reference (and not the stripped one) against the hideRefs
pattern there. In particular, when using transfer.hideRefs, the same
pattern does different things when receiving and uploading.

Now, this cannot be intended behavior and I do not think this is
something we want to retain when improving that feature. My suggestion
is:

1. Define the (current) semantics of hideRefs pattern. It either needs
   to be defined to match full references or stripped references. Both
   definitions are equivalent when Git namespaces are not used.
   
   It probably makes sense to define hideRefs patterns to match stripped
   references. If anybody relied on the upload-pack behavior of patterns
   matching full references, it may happen that more refs are hidden
   when that behavior is adjusted to match the new hideRefs semantics.
   The administrator would become aware of that change soon if it
   affects anything (i.e. hides things that should not be hidden). But I
   am pretty sure that this behavior isn't currently being relied on
   either way. Both Git namespaces and hideRefs aren't very popular
   features and anybody using that combination would probably have
   noticed the inconsistency and reported a bug earlier.

2. Improve the documentation and describe the hideRefs semantics better.
   Include details on the choice we made in (1).

3. Fix the send_ref() code in either receive-pack or upload-pack,
   depending on which is buggy according to our new definition.

4. Improve hideRefs patterns and allow to match both full references and
   stripped references by using a special indicator as suggested
   earlier.

5. Add a note on the change in behavior to the release notes of the
   release that "breaks backwards compatibility". Putting it in quotes
   because I actually think that we are fixing a bug rather than
   breaking compatibility. But since there was no documentation on the
   correct behavior, the former implementation was, technically, the
   only specification of "correct" behavior that existed at that
   point...

The second thing I noticed is that having syntax for allowing matches
against both full references and stripped references is extremely handy
and desirable, even if we would not have to introduce it for backwards
compatibility. For example, using the syntax Junio described earlier, my
initial use case could be solved by

    receive.hideRefs=^refs/
    receive.hideRefs=!refs/

which means "Hide all references but do not hide references from the
current namespace." Here, I am assuming that patterns for stripped refs
never match anything outside the current namespace because those
patterns become NULL after stripping. This kind of supersedes the other
discussion on setting configuration via environment as well (at least in
this context).

Regards,
Lukas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Allow hideRefs to match refs outside the namespace
  2015-11-01 11:27           ` Lukas Fleischer
@ 2015-11-01 18:18             ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2015-11-01 18:18 UTC (permalink / raw
  To: Lukas Fleischer; +Cc: git

Lukas Fleischer <lfleischer@lfos.de> writes:

> Now, this cannot be intended behavior and I do not think this is
> something we want to retain when improving that feature.

Yup, that makes me suspect that namespace support with hiderefs was
done without giving much thought even stronger than before, and the
fact that nobody has brought it up so far suggests it would be much
smaller deal than usual if a fix brings in incompatibilities to
those who use namespaces.

> 1. Define the (current) semantics of hideRefs pattern. It either needs
>    to be defined to match full references or stripped references. Both
>    definitions are equivalent when Git namespaces are not used.
>    
>    It probably makes sense to define hideRefs patterns to match stripped
>    references.

OK.

> 2. Improve the documentation and describe the hideRefs semantics better.
> 3. Fix the send_ref() code in either receive-pack or upload-pack,
> 4. Improve hideRefs patterns and allow to match both full references and
> 5. Add a note on the change in behavior to the release notes of the

All OK.

> The second thing I noticed is that having syntax for allowing matches
> against both full references and stripped references is extremely handy
> and desirable, even if we would not have to introduce it for backwards
> compatibility. For example, using the syntax Junio described earlier, my
> initial use case could be solved by
>
>     receive.hideRefs=^refs/
>     receive.hideRefs=!refs/
>
> which means "Hide all references but do not hide references from the
> current namespace." Here, I am assuming that patterns for stripped refs
> never match anything outside the current namespace because those
> patterns become NULL after stripping.

I would instead assume that the presence of ^ (or !^) in front would
signal "do not strip before checking".  !refs/ would mean "after
stripping, does it begin with refs/?  If so then do not hide it".

But that does not change the conclusion.  With ^refs/ that says
"hide everything that matches refs/ before stripping" (i.e. do not
include anything from anywhere), that is overriden by !refs/ that
says "but do not hide anything that matches refs/ after stripping"
(do include everything from my namespace), I'd think that you'd get
your desired behaviour.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-11-01 18:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-26  8:09 [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace Lukas Fleischer
2015-10-26 19:58 ` Junio C Hamano
     [not found]   ` <20151027053916.3030.8259@typhoon.lan>
     [not found]     ` <20151027055911.4877.94179@typhoon.lan>
2015-10-27 14:32       ` Lukas Fleischer
2015-10-27 18:18         ` Junio C Hamano
2015-10-28  7:00           ` Lukas Fleischer
2015-10-28 13:42             ` Jeff King
2015-10-28 15:48             ` Junio C Hamano
2015-10-30 21:31         ` Junio C Hamano
2015-10-30 21:46           ` Jeff King
2015-10-31  9:03             ` Lukas Fleischer
2015-10-28 15:42 ` [PATCH] Allow hideRefs to match " Lukas Fleischer
2015-10-28 16:21   ` Junio C Hamano
2015-10-31  8:49     ` Lukas Fleischer
2015-10-31 17:31       ` Junio C Hamano
2015-10-31 23:40         ` Lukas Fleischer
2015-11-01 11:27           ` Lukas Fleischer
2015-11-01 18:18             ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).