git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] submodule: use cheaper check for submodule pushes
@ 2017-07-12 23:45 Stefan Beller
  2017-07-13  0:01 ` Jonathan Nieder
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Beller @ 2017-07-12 23:45 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

In the function push_submodule[1] we use add_submodule_odb[2] to determine
if a submodule has been populated. However the function does not work with
the submodules objects that are added, instead a new child process is used
to perform the actual push in the submodule.

Use is_submodule_populated[3] that is cheaper to guard from unpopulated
submodules.

[1] 'push_submodule' was added in eb21c732d6 (push: teach
    --recurse-submodules the on-demand option, 2012-03-29)
[2] 'add_submodule_odb' was introduced in 752c0c2492 (Add the
    --submodule option to the diff option family, 2009-10-19)
[3] 'is_submodule_populated' was added in 5688c28d81 (submodules:
    add helper to determine if a submodule is populated, 2016-12-16)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index da2b484879..55afad3e8c 100644
--- a/submodule.c
+++ b/submodule.c
@@ -976,7 +976,9 @@ static int push_submodule(const char *path,
 			  const struct string_list *push_options,
 			  int dry_run)
 {
-	if (add_submodule_odb(path))
+	int code;
+
+	if (!is_submodule_populated_gently(path, &code))
 		return 1;
 
 	if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
-- 
2.13.2.695.g117ddefdb4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule: use cheaper check for submodule pushes
  2017-07-12 23:45 [PATCH] submodule: use cheaper check for submodule pushes Stefan Beller
@ 2017-07-13  0:01 ` Jonathan Nieder
  2017-07-13  0:09   ` Stefan Beller
  2017-07-13  0:53   ` Junio C Hamano
  0 siblings, 2 replies; 20+ messages in thread
From: Jonathan Nieder @ 2017-07-13  0:01 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Hi,

Stefan Beller wrote:

> In the function push_submodule[1] we use add_submodule_odb[2] to determine
> if a submodule has been populated. However the function does not work with
> the submodules objects that are added, instead a new child process is used
> to perform the actual push in the submodule.
>
> Use is_submodule_populated[3] that is cheaper to guard from unpopulated
> submodules.
>
> [1] 'push_submodule' was added in eb21c732d6 (push: teach
>     --recurse-submodules the on-demand option, 2012-03-29)
> [2] 'add_submodule_odb' was introduced in 752c0c2492 (Add the
>     --submodule option to the diff option family, 2009-10-19)
> [3] 'is_submodule_populated' was added in 5688c28d81 (submodules:
>     add helper to determine if a submodule is populated, 2016-12-16)

These footnotes don't answer the question that I really have: why did
this use add_submodule_odb in the first place?

E.g. did the ref iteration code require access to the object store
previously and stop requiring it later?

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/submodule.c b/submodule.c
> index da2b484879..55afad3e8c 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -976,7 +976,9 @@ static int push_submodule(const char *path,
>  			  const struct string_list *push_options,
>  			  int dry_run)
>  {
> -	if (add_submodule_odb(path))
> +	int code;
> +
> +	if (!is_submodule_populated_gently(path, &code))

Should this examine the code to distinguish between hard errors
(e.g. "Error reading .git") and a missing repository?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule: use cheaper check for submodule pushes
  2017-07-13  0:01 ` Jonathan Nieder
@ 2017-07-13  0:09   ` Stefan Beller
  2017-07-13  0:53   ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Beller @ 2017-07-13  0:09 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git@vger.kernel.org

On Wed, Jul 12, 2017 at 5:01 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:

> These footnotes don't answer the question that I really have: why did
> this use add_submodule_odb in the first place?

oh, I forgot to put that down: presumably add_submodule_odb was
used because it was available back then? Note the dates!

> E.g. did the ref iteration code require access to the object store
> previously and stop requiring it later?

No. See [1], it was overkill since the beginning of time.

>> -     if (add_submodule_odb(path))
>> +     int code;
>> +
>> +     if (!is_submodule_populated_gently(path, &code))
>
> Should this examine the code to distinguish between hard errors
> (e.g. "Error reading .git") and a missing repository?

add_submodule_odb does neither, so I think this is best kept without
additional checks.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule: use cheaper check for submodule pushes
  2017-07-13  0:01 ` Jonathan Nieder
  2017-07-13  0:09   ` Stefan Beller
@ 2017-07-13  0:53   ` Junio C Hamano
  2017-07-13  5:14     ` Stefan Beller
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2017-07-13  0:53 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Stefan Beller, git

Jonathan Nieder <jrnieder@gmail.com> writes:

>> In the function push_submodule[1] we use add_submodule_odb[2] to determine
>> if a submodule has been populated. However the function does not work with
>> the submodules objects that are added, instead a new child process is used
>> to perform the actual push in the submodule.
>>
>> Use is_submodule_populated[3] that is cheaper to guard from unpopulated
>> submodules.
>>
>> [1] 'push_submodule' was added in eb21c732d6 (push: teach
>>     --recurse-submodules the on-demand option, 2012-03-29)
>> [2] 'add_submodule_odb' was introduced in 752c0c2492 (Add the
>>     --submodule option to the diff option family, 2009-10-19)
>> [3] 'is_submodule_populated' was added in 5688c28d81 (submodules:
>>     add helper to determine if a submodule is populated, 2016-12-16)
>
> These footnotes don't answer the question that I really have: why did
> this use add_submodule_odb in the first place?
>
> E.g. did the ref iteration code require access to the object store
> previously and stop requiring it later?

Yes, the most important question is if it is safe to lose the access
to the object store of the submodule.  It is an endgame we should
aim for to get rid of add_submodule_odb(), but does the rest of this
codepath not require objects in the submodule at all or do we still
need to change something to make it so?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule: use cheaper check for submodule pushes
  2017-07-13  0:53   ` Junio C Hamano
@ 2017-07-13  5:14     ` Stefan Beller
  2017-07-13 18:37       ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Beller @ 2017-07-13  5:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, git@vger.kernel.org

On Wed, Jul 12, 2017 at 5:53 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>>> In the function push_submodule[1] we use add_submodule_odb[2] to determine
>>> if a submodule has been populated. However the function does not work with
>>> the submodules objects that are added, instead a new child process is used
>>> to perform the actual push in the submodule.
>>>
>>> Use is_submodule_populated[3] that is cheaper to guard from unpopulated
>>> submodules.
>>>
>>> [1] 'push_submodule' was added in eb21c732d6 (push: teach
>>>     --recurse-submodules the on-demand option, 2012-03-29)
>>> [2] 'add_submodule_odb' was introduced in 752c0c2492 (Add the
>>>     --submodule option to the diff option family, 2009-10-19)
>>> [3] 'is_submodule_populated' was added in 5688c28d81 (submodules:
>>>     add helper to determine if a submodule is populated, 2016-12-16)
>>
>> These footnotes don't answer the question that I really have: why did
>> this use add_submodule_odb in the first place?
>>
>> E.g. did the ref iteration code require access to the object store
>> previously and stop requiring it later?
>
> Yes, the most important question is if it is safe to lose the access
> to the object store of the submodule.  It is an endgame we should
> aim for to get rid of add_submodule_odb(), but does the rest of this
> codepath not require objects in the submodule at all or do we still
> need to change something to make it so?

Yes, as the code in the current form as well as in its first occurrence
used the result of add_submodule_odb to determine if to spawn a child process.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule: use cheaper check for submodule pushes
  2017-07-13  5:14     ` Stefan Beller
@ 2017-07-13 18:37       ` Junio C Hamano
  2017-07-13 19:39         ` Stefan Beller
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2017-07-13 18:37 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Nieder, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

> On Wed, Jul 12, 2017 at 5:53 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jonathan Nieder <jrnieder@gmail.com> writes:
>>
>>>> In the function push_submodule[1] we use add_submodule_odb[2] to determine
>>>> if a submodule has been populated. However the function does not work with
>>>> the submodules objects that are added, instead a new child process is used
>>>> to perform the actual push in the submodule.
>>>>
>>>> Use is_submodule_populated[3] that is cheaper to guard from unpopulated
>>>> submodules.
>>>>
>>>> [1] 'push_submodule' was added in eb21c732d6 (push: teach
>>>>     --recurse-submodules the on-demand option, 2012-03-29)
>>>> [2] 'add_submodule_odb' was introduced in 752c0c2492 (Add the
>>>>     --submodule option to the diff option family, 2009-10-19)
>>>> [3] 'is_submodule_populated' was added in 5688c28d81 (submodules:
>>>>     add helper to determine if a submodule is populated, 2016-12-16)
>>>
>>> These footnotes don't answer the question that I really have: why did
>>> this use add_submodule_odb in the first place?
>>>
>>> E.g. did the ref iteration code require access to the object store
>>> previously and stop requiring it later?
>>
>> Yes, the most important question is if it is safe to lose the access
>> to the object store of the submodule.  It is an endgame we should
>> aim for to get rid of add_submodule_odb(), but does the rest of this
>> codepath not require objects in the submodule at all or do we still
>> need to change something to make it so?
>
> Yes, as the code in the current form as well as in its first occurrence
> used the result of add_submodule_odb to determine if to spawn a child process.

The original added so that the return value of the call can be used
for that, and the current code still uses the return value for that
purpose.

That much is already known.  

I think Jonathan's question (which I concurred) is if we also ended
up relying on the side effect of calling that function (i.e. being
able to now find objects that are not in our repository but in the
submodule's object store).  By looking at the eb21c732d6, we can
tell that the original didn't mean to and didn't add any code that
relies on the ability to be able to read from the submodule object
store.  I am not sure if that is still true after 5 years (i.e. is
there any new code added in the meantime that made us depend on the
ability to read from submodule object store?).

My hunch (and hope) is that we are probably safe, but that is a lot
weaker than "yes this is a good change we want to apply".


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule: use cheaper check for submodule pushes
  2017-07-13 18:37       ` Junio C Hamano
@ 2017-07-13 19:39         ` Stefan Beller
  2017-07-13 20:48           ` Jonathan Nieder
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Beller @ 2017-07-13 19:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, git@vger.kernel.org

On Thu, Jul 13, 2017 at 11:37 AM, Junio C Hamano <gitster@pobox.com> wrote:

>
> I think Jonathan's question (which I concurred) is if we also ended
> up relying on the side effect of calling that function (i.e. being
> able to now find objects that are not in our repository but in the
> submodule's object store).  By looking at the eb21c732d6, we can
> tell that the original didn't mean to and didn't add any code that
> relies on the ability to be able to read from the submodule object
> store.  I am not sure if that is still true after 5 years (i.e. is
> there any new code added in the meantime that made us depend on the
> ability to read from submodule object store?).

Yes we are safe, because the function itself only spawns a child process
(not using any of the objects).

It's only caller push_unpushed_submodules also doesn't rely on objects
loaded after calling push_submodule.

The caller of push_unpushed_submodules (transport.c, transport_push)
also doesn't need submodule objects loaded.

> My hunch (and hope) is that we are probably safe, but that is a lot
> weaker than "yes this is a good change we want to apply".

Given the above (I went through the code), all I can do is repeating
"yes this is a good change we want to apply".

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule: use cheaper check for submodule pushes
  2017-07-13 19:39         ` Stefan Beller
@ 2017-07-13 20:48           ` Jonathan Nieder
  2017-07-13 20:54             ` Stefan Beller
  0 siblings, 1 reply; 20+ messages in thread
From: Jonathan Nieder @ 2017-07-13 20:48 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org

Hi,

Stefan Beller wrote:

> Yes we are safe, because the function itself only spawns a child process
> (not using any of the objects).
>
> It's only caller push_unpushed_submodules also doesn't rely on objects
> loaded after calling push_submodule.
>
> The caller of push_unpushed_submodules (transport.c, transport_push)
> also doesn't need submodule objects loaded.

Thanks for looking into it.  This is what the commit message should
say to help reviewers or people trying to understand it later.  The
footnotes don't help and are distracting, except that it makes sense
to point out the original GSoC patch to say the alternate submodule
odb wasn't needed even then.

E.g.:

 Subject: push: do not add submodule odb as an alternate when recursing on demand

 "git push --recurse-submodules=on-demand" adds each submodule as an
 alternate with add_submodule_odb before checking whether the
 submodule has anything to push and pushing it if so.

 However, it never accesses any objects from the submodule.  In the
 parent process it uses the submodule's ref database to see if there
 is anything to push.  The actual push (which does rely on objects)
 occurs in a child process.

 The same was try when this call was originally added in
 v1.7.11-rc0~111^2 (push: teach --recurse-submodules the on-demand
 option, 2012-03-29).  Most likely it was added by analogy with
 fetch --recurse-submodules=on-demand, which did use the submodule's
 object database.

 Use is_submodule_populated_gently instead, which is simpler and
 cheaper.

[...]
> On Thu, Jul 13, 2017 at 11:37 AM, Junio C Hamano <gitster@pobox.com> wrote:

>> My hunch (and hope) is that we are probably safe, but that is a lot
>> weaker than "yes this is a good change we want to apply".
>
> Given the above (I went through the code), all I can do is repeating
> "yes this is a good change we want to apply".

With such a commit message change, this seems like a reasonable change
in principle (though I haven't looked carefully to verify it).

My one doubt is the is_submodule_populated_gently.  Why are we using
that instead of simpler is_submodule_populated?  The names and API
comments don't explain.

Thanks for your patient explanations,
Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] submodule: use cheaper check for submodule pushes
  2017-07-13 20:48           ` Jonathan Nieder
@ 2017-07-13 20:54             ` Stefan Beller
  2017-08-15 22:43               ` [PATCH] push: do not add submodule odb as an alternate when recursing on demand Stefan Beller
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Beller @ 2017-07-13 20:54 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git@vger.kernel.org

On Thu, Jul 13, 2017 at 1:48 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi,
>
> Stefan Beller wrote:
>
>> Yes we are safe, because the function itself only spawns a child process
>> (not using any of the objects).
>>
>> It's only caller push_unpushed_submodules also doesn't rely on objects
>> loaded after calling push_submodule.
>>
>> The caller of push_unpushed_submodules (transport.c, transport_push)
>> also doesn't need submodule objects loaded.
>
> Thanks for looking into it.  This is what the commit message should
> say to help reviewers or people trying to understand it later.  The
> footnotes don't help and are distracting, except that it makes sense
> to point out the original GSoC patch to say the alternate submodule
> odb wasn't needed even then.
>
> E.g.:
>
>  Subject: push: do not add submodule odb as an alternate when recursing on demand
>
>  "git push --recurse-submodules=on-demand" adds each submodule as an
>  alternate with add_submodule_odb before checking whether the
>  submodule has anything to push and pushing it if so.
>
>  However, it never accesses any objects from the submodule.  In the
>  parent process it uses the submodule's ref database to see if there
>  is anything to push.  The actual push (which does rely on objects)
>  occurs in a child process.
>
>  The same was try when this call was originally added in
>  v1.7.11-rc0~111^2 (push: teach --recurse-submodules the on-demand
>  option, 2012-03-29).  Most likely it was added by analogy with
>  fetch --recurse-submodules=on-demand, which did use the submodule's
>  object database.
>
>  Use is_submodule_populated_gently instead, which is simpler and
>  cheaper.

Thanks for giving a good example of commit message that I could use
in a reroll.


> With such a commit message change, this seems like a reasonable change
> in principle (though I haven't looked carefully to verify it).
>
> My one doubt is the is_submodule_populated_gently.  Why are we using
> that instead of simpler is_submodule_populated?  The names and API
> comments don't explain.

One could posit this is laziness of thinking.
See 15cdc64776 (make is_submodule_populated gently, 2017-03-14),
and discover there is no non-gentle version of is_submodule_populated.
And for each new use, it may be cheaper to just use the gentle version
instead of adding a non-gentle version.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-07-13 20:54             ` Stefan Beller
@ 2017-08-15 22:43               ` Stefan Beller
  2017-08-15 23:10                 ` Jonathan Nieder
                                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Stefan Beller @ 2017-08-15 22:43 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jrnieder

"git push --recurse-submodules=on-demand" adds each submodule as an
alternate with add_submodule_odb before checking whether the
submodule has anything to push and pushing it if so.

However, it never accesses any objects from the submodule.  In the
parent process it uses the submodule's ref database to see if there
is anything to push.  The actual push (which does rely on objects)
occurs in a child process.

The same was true when this call was originally added in
v1.7.11-rc0~111^2 (push: teach --recurse-submodules the on-demand
option, 2012-03-29).  Most likely it was added by analogy with
fetch --recurse-submodules=on-demand, which did use the submodule's
object database.

Use is_submodule_populated_gently instead, which is simpler and
cheaper.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

 Originally I intended to send this out as part of a larger series,
 but the series is getting too large series, sending all things in smaller
 units!
 
 Thanks,
 Stefan

 submodule.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 111a3007fc..e20216bc0f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -966,7 +966,9 @@ static int push_submodule(const char *path,
 			  const struct string_list *push_options,
 			  int dry_run)
 {
-	if (add_submodule_odb(path))
+	int code;
+
+	if (!is_submodule_populated_gently(path, &code))
 		return 1;
 
 	if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
-- 
2.14.0.rc0.3.g6c2e499285


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-15 22:43               ` [PATCH] push: do not add submodule odb as an alternate when recursing on demand Stefan Beller
@ 2017-08-15 23:10                 ` Jonathan Nieder
  2017-08-15 23:14                 ` Jonathan Nieder
  2017-08-15 23:23                 ` Junio C Hamano
  2 siblings, 0 replies; 20+ messages in thread
From: Jonathan Nieder @ 2017-08-15 23:10 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, gitster

Stefan Beller wrote:

> "git push --recurse-submodules=on-demand" adds each submodule as an
> alternate with add_submodule_odb before checking whether the
> submodule has anything to push and pushing it if so.
>
> However, it never accesses any objects from the submodule.  In the
> parent process it uses the submodule's ref database to see if there
> is anything to push.  The actual push (which does rely on objects)
> occurs in a child process.
>
> The same was true when this call was originally added in
> v1.7.11-rc0~111^2 (push: teach --recurse-submodules the on-demand
> option, 2012-03-29).  Most likely it was added by analogy with
> fetch --recurse-submodules=on-demand, which did use the submodule's
> object database.
>
> Use is_submodule_populated_gently instead, which is simpler and
> cheaper.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks for picking up this loose end.

> diff --git a/submodule.c b/submodule.c
> index 111a3007fc..e20216bc0f 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -966,7 +966,9 @@ static int push_submodule(const char *path,
>  			  const struct string_list *push_options,
>  			  int dry_run)
>  {
> -	if (add_submodule_odb(path))
> +	int code;
> +
> +	if (!is_submodule_populated_gently(path, &code))
>  		return 1;
>  
>  	if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-15 22:43               ` [PATCH] push: do not add submodule odb as an alternate when recursing on demand Stefan Beller
  2017-08-15 23:10                 ` Jonathan Nieder
@ 2017-08-15 23:14                 ` Jonathan Nieder
  2017-08-15 23:27                   ` Stefan Beller
  2017-08-15 23:23                 ` Junio C Hamano
  2 siblings, 1 reply; 20+ messages in thread
From: Jonathan Nieder @ 2017-08-15 23:14 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, gitster

Stefan Beller wrote:

> Use is_submodule_populated_gently instead, which is simpler and
> cheaper.
[...]
> --- a/submodule.c
> +++ b/submodule.c
> @@ -966,7 +966,9 @@ static int push_submodule(const char *path,
>  			  const struct string_list *push_options,
>  			  int dry_run)
>  {
> -	if (add_submodule_odb(path))
> +	int code;
> +
> +	if (!is_submodule_populated_gently(path, &code))
>  		return 1;

Ah, I forgot about this detail.  I don't think it should block this
patch (so my Reviewed-by still stands), but I wonder why this needs to
be gentle.  add_submodule_odb is gentle so that is the conservative
thing to do, but that doesn't mean it is the *right* thing to do.

If this passed NULL instead of &code as the second argument, would
anything break?

Could there be a comment explaining what kind of error we are
expecting and why it is okay to continue when that error is
encountered without any error handling?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-15 22:43               ` [PATCH] push: do not add submodule odb as an alternate when recursing on demand Stefan Beller
  2017-08-15 23:10                 ` Jonathan Nieder
  2017-08-15 23:14                 ` Jonathan Nieder
@ 2017-08-15 23:23                 ` Junio C Hamano
  2017-08-15 23:31                   ` Stefan Beller
  2 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2017-08-15 23:23 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jrnieder

Stefan Beller <sbeller@google.com> writes:

> "git push --recurse-submodules=on-demand" adds each submodule as an
> alternate with add_submodule_odb before checking whether the
> submodule has anything to push and pushing it if so.
>
> However, it never accesses any objects from the submodule.
> ...
> Use is_submodule_populated_gently instead, which is simpler and
> cheaper.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
>  Originally I intended to send this out as part of a larger series,
>  but the series is getting too large series, sending all things in smaller
>  units!

This vaguely reminds me that you sent something imilar perhaps for a
different codepath.

Is "is it populated" a good thing to check here, though?  IIRC,
add-submodule-odb allows you to add the object database of an
inactivated submodule, so this seems to change the behaviour.  I do
not know if the behaviour change is a good thing (i.e. bugfix) or
not (i.e. regression) offhand, though.

Thanks.

> diff --git a/submodule.c b/submodule.c
> index 111a3007fc..e20216bc0f 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -966,7 +966,9 @@ static int push_submodule(const char *path,
>  			  const struct string_list *push_options,
>  			  int dry_run)
>  {
> -	if (add_submodule_odb(path))
> +	int code;
> +
> +	if (!is_submodule_populated_gently(path, &code))
>  		return 1;
>  
>  	if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-15 23:14                 ` Jonathan Nieder
@ 2017-08-15 23:27                   ` Stefan Beller
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Beller @ 2017-08-15 23:27 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git@vger.kernel.org, Junio C Hamano

On Tue, Aug 15, 2017 at 4:14 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Stefan Beller wrote:
>
>> Use is_submodule_populated_gently instead, which is simpler and
>> cheaper.
> [...]
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -966,7 +966,9 @@ static int push_submodule(const char *path,
>>                         const struct string_list *push_options,
>>                         int dry_run)
>>  {
>> -     if (add_submodule_odb(path))
>> +     int code;
>> +
>> +     if (!is_submodule_populated_gently(path, &code))
>>               return 1;
>
> Ah, I forgot about this detail.  I don't think it should block this
> patch (so my Reviewed-by still stands), but I wonder why this needs to
> be gentle.  add_submodule_odb is gentle so that is the conservative
> thing to do, but that doesn't mean it is the *right* thing to do.
>
> If this passed NULL instead of &code as the second argument, would
> anything break?

push_submodule, which is called by push_unpushed_submodules
just issues warnings on submodule push error, (which happen all
before the main push,) such that any submodule error does not prevent
the main push.

We could tighten that, but I would suggest another patch for that.

> Could there be a comment explaining what kind of error we are
> expecting and why it is okay to continue when that error is
> encountered without any error handling?

The same as in add_submodule_odb, digging through the
logs. I up to now we did not care about the submodule succeeding
as much, it was just an aid for the main repo push.

>
> Thanks,
> Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-15 23:23                 ` Junio C Hamano
@ 2017-08-15 23:31                   ` Stefan Beller
  2017-08-16  0:11                     ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Beller @ 2017-08-15 23:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Jonathan Nieder

On Tue, Aug 15, 2017 at 4:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> "git push --recurse-submodules=on-demand" adds each submodule as an
>> alternate with add_submodule_odb before checking whether the
>> submodule has anything to push and pushing it if so.
>>
>> However, it never accesses any objects from the submodule.
>> ...
>> Use is_submodule_populated_gently instead, which is simpler and
>> cheaper.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>
>>  Originally I intended to send this out as part of a larger series,
>>  but the series is getting too large series, sending all things in smaller
>>  units!
>
> This vaguely reminds me that you sent something imilar perhaps for a
> different codepath.

https://public-inbox.org/git/xmqqh8xzq6td.fsf@gitster.mtv.corp.google.com/

> Is "is it populated" a good thing to check here, though?  IIRC,
> add-submodule-odb allows you to add the object database of an
> inactivated submodule, so this seems to change the behaviour.  I do
> not know if the behaviour change is a good thing (i.e. bugfix) or
> not (i.e. regression) offhand, though.

Good point, we should be able to push non-populated, even inactive(?)
submodules. For that we strictly need add_submodule_odb here
(or the repo object of the submodule, eventually).

So let's retract this patch for now.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-15 23:31                   ` Stefan Beller
@ 2017-08-16  0:11                     ` Junio C Hamano
  2017-08-16  1:05                       ` Stefan Beller
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2017-08-16  0:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jonathan Nieder

Stefan Beller <sbeller@google.com> writes:

>> Is "is it populated" a good thing to check here, though?  IIRC,
>> add-submodule-odb allows you to add the object database of an
>> inactivated submodule, so this seems to change the behaviour.  I do
>> not know if the behaviour change is a good thing (i.e. bugfix) or
>> not (i.e. regression) offhand, though.
>
> Good point, we should be able to push non-populated, even inactive(?)
> submodules. For that we strictly need add_submodule_odb here
> (or the repo object of the submodule, eventually).
>
> So let's retract this patch for now.

Not so fast.  

I am not convinced "push --recursive" should touch a submodule that
was once cloned from the upstream and then deactivated, so using
add-submodule-odb to decide if the push should go through may be a
bug that we may want to fix, in which case the diff of the patch in
question may be good as-is.  We need to sell it as a bugfix to the
users, who may complain about behaviour change (if there is one).

On the other hand, even if it were desirable for such a deactivated
submodule to be pushed, as your log message explained, there is no
reason to contaminate the in-core object hash by calling the
add-submodule-odb helper, when the only thing we care about is "do
we have the refs and object store for this submodule? we do not care
if it is activated or not".  Perhaps there is a more appropriate
helper in submodule.c that answers that question that we should be
using instead of add-submodule-odb, and if there is not yet such a
helper, perhaps this indicates that we need to add such a helper,
which essentially is the early half of what add-submodule-odb does,
i.e. ask git_path_submodule() for the object store and check if that
directory exists.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-16  0:11                     ` Junio C Hamano
@ 2017-08-16  1:05                       ` Stefan Beller
  2017-08-16  2:08                         ` Jonathan Nieder
  2017-08-16 16:35                         ` Heiko Voigt
  0 siblings, 2 replies; 20+ messages in thread
From: Stefan Beller @ 2017-08-16  1:05 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Tan; +Cc: git@vger.kernel.org, Jonathan Nieder

On Tue, Aug 15, 2017 at 5:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>> Is "is it populated" a good thing to check here, though?  IIRC,
>>> add-submodule-odb allows you to add the object database of an
>>> inactivated submodule, so this seems to change the behaviour.  I do
>>> not know if the behaviour change is a good thing (i.e. bugfix) or
>>> not (i.e. regression) offhand, though.
>>
>> Good point, we should be able to push non-populated, even inactive(?)
>> submodules. For that we strictly need add_submodule_odb here
>> (or the repo object of the submodule, eventually).
>>
>> So let's retract this patch for now.
>
> Not so fast.

Ok, I took another look at the code.

While we may desire that un-populated submodules can be pushed
(due to checking out another revision where the submodule
doesn't exist, before pushing), this is not supported currently, because
the call to run the push in the submodule assumes there is a
"<path>/.git" on which the child process can operate.
So for now we HAVE to have the submodule populated.

  In the future we may have the a lighter version just checking the
  object store of the submodule. Maybe this use case in the submodule
  can be covered by the .remote/.imported mechanism as well, such
  that we'd know if we have any local objects.

> I am not convinced "push --recursive" should touch a submodule that
> was once cloned from the upstream and then deactivated, so using
> add-submodule-odb to decide if the push should go through may be a
> bug that we may want to fix, in which case the diff of the patch in
> question may be good as-is.  We need to sell it as a bugfix to the
> users, who may complain about behaviour change (if there is one).

correct. Coupled with the observation above, we want really need
both "is active and populated" (your paragraph suggest not-active
submodules don't need pushing, the populated bit comes from the
observation above)

> On the other hand, even if it were desirable for such a deactivated
> submodule to be pushed, as your log message explained, there is no
> reason to contaminate the in-core object hash by calling the
> add-submodule-odb helper, when the only thing we care about is "do
> we have the refs and object store for this submodule? we do not care
> if it is activated or not".  Perhaps there is a more appropriate
> helper in submodule.c that answers that question that we should be
> using instead of add-submodule-odb, and if there is not yet such a
> helper, perhaps this indicates that we need to add such a helper,
> which essentially is the early half of what add-submodule-odb does,
> i.e. ask git_path_submodule() for the object store and check if that
> directory exists.

I think due to the constraint of also needing a worktree the is_populated
helper is the correct choice here.


>
> Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-16  1:05                       ` Stefan Beller
@ 2017-08-16  2:08                         ` Jonathan Nieder
  2017-08-16  5:52                           ` Stefan Beller
  2017-08-16 16:35                         ` Heiko Voigt
  1 sibling, 1 reply; 20+ messages in thread
From: Jonathan Nieder @ 2017-08-16  2:08 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Jonathan Tan, git@vger.kernel.org

Hi,

Stefan Beller wrote:
> On Tue, Aug 15, 2017 at 5:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>> Junio C Hamano wrote:

>>>> Is "is it populated" a good thing to check here, though?  IIRC,
>>>> add-submodule-odb allows you to add the object database of an
>>>> inactivated submodule, so this seems to change the behaviour.  I do
>>>> not know if the behaviour change is a good thing (i.e. bugfix) or
>>>> not (i.e. regression) offhand, though.
>>>
>>> Good point, we should be able to push non-populated, even inactive(?)
>>> submodules. For that we strictly need add_submodule_odb here
>>> (or the repo object of the submodule, eventually).
>>>
>>> So let's retract this patch for now.
>>
>> Not so fast.
>
> Ok, I took another look at the code.
>
> While we may desire that un-populated submodules can be pushed
> (due to checking out another revision where the submodule
> doesn't exist, before pushing), this is not supported currently, because
> the call to run the push in the submodule assumes there is a
> "<path>/.git" on which the child process can operate.
> So for now we HAVE to have the submodule populated.

It was not immediately obvious to me that this is just "for now".

I would be really confused if I had deactivated a submodule and
"git push --recurse-submodules" pushed from it anyway.  If the
submodule is active but not populated, then the question becomes "Why
wasn't it populated?"

If this is a bare repository, then nothing is populated, and pushing
from an active-but-unpopulated submodule sounds like a plausible wish.
But in a non-bare repository, I'm having trouble imagining the use
case that brings this situation about.

And where people have been needing this so far has been non-bare
repositories.  In that context, the check "is active and populated"
does not seem unusual or provisional.  Are you hinting that replacing
the check with "is active" would make it work well in bare
repositories?  I think I agree, though you'd have to be careful about
the case where the submodule is active but hasn't been fetched to
$GIT_DIR/modules yet.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-16  2:08                         ` Jonathan Nieder
@ 2017-08-16  5:52                           ` Stefan Beller
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Beller @ 2017-08-16  5:52 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, Jonathan Tan, git@vger.kernel.org

On Tue, Aug 15, 2017 at 7:08 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi,
>
> Stefan Beller wrote:
>> On Tue, Aug 15, 2017 at 5:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Stefan Beller <sbeller@google.com> writes:
>>>> Junio C Hamano wrote:
>
>>>>> Is "is it populated" a good thing to check here, though?  IIRC,
>>>>> add-submodule-odb allows you to add the object database of an
>>>>> inactivated submodule, so this seems to change the behaviour.  I do
>>>>> not know if the behaviour change is a good thing (i.e. bugfix) or
>>>>> not (i.e. regression) offhand, though.
>>>>
>>>> Good point, we should be able to push non-populated, even inactive(?)
>>>> submodules. For that we strictly need add_submodule_odb here
>>>> (or the repo object of the submodule, eventually).
>>>>
>>>> So let's retract this patch for now.
>>>
>>> Not so fast.
>>
>> Ok, I took another look at the code.
>>
>> While we may desire that un-populated submodules can be pushed
>> (due to checking out another revision where the submodule
>> doesn't exist, before pushing), this is not supported currently, because
>> the call to run the push in the submodule assumes there is a
>> "<path>/.git" on which the child process can operate.
>> So for now we HAVE to have the submodule populated.
>
> It was not immediately obvious to me that this is just "for now".
>
> I would be really confused if I had deactivated a submodule and
> "git push --recurse-submodules" pushed from it anyway.

agreed.

>  If the
> submodule is active but not populated, then the question becomes "Why
> wasn't it populated?"

because you checked out a different revision where the sub is not
part of the working tree, for example. As long as you have locally
interesting "stuff" you want to transfer it. (IMHO even when the sub
is not active)

> If this is a bare repository, then nothing is populated,

Currently bare repos do not support having submodules
AT ALL, i.e. there is no modules/<name> directory
where the submodules would have any valuable information.
instead we *only* have the .gitmodules files at e.g. HEAD.

> and pushing
> from an active-but-unpopulated submodule sounds like a plausible wish.

it seems we're on the same page here.

> But in a non-bare repository, I'm having trouble imagining the use
> case that brings this situation about.

git checkout revision-with-sub
# hack.. hack.. hack..
"git commit --recurse-submodules -m bugfix"
# ^ not yet implemented
git checkout revision-without-sub
# Oh! I forgot to push the bugfix
git push --recurse-submodule <bugfix>

> And where people have been needing this so far has been non-bare
> repositories.  In that context, the check "is active and populated"
> does not seem unusual or provisional.  Are you hinting that replacing
> the check with "is active" would make it work well in bare
> repositories?

No I was saying:
* currently the code only works when "is populated"
* currently "is active" seems not to be considered

* in the future the code shall work even when not "is populated",
  (but submodule git dir is there at the very least)
* we may want to discuss if we want to care about "is active"
  additionally

>  I think I agree, though you'd have to be careful about
> the case where the submodule is active but hasn't been fetched to
> $GIT_DIR/modules yet.

When not fetched yet, there is no modules/<name> git dir and
we are sure there is no locally interesting information, so we
want to omit the push. (see above "has git dir" is the minimum
requirement, as opposed to current "is populated".

> Thanks,
> Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand
  2017-08-16  1:05                       ` Stefan Beller
  2017-08-16  2:08                         ` Jonathan Nieder
@ 2017-08-16 16:35                         ` Heiko Voigt
  1 sibling, 0 replies; 20+ messages in thread
From: Heiko Voigt @ 2017-08-16 16:35 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Jonathan Tan, git@vger.kernel.org,
	Jonathan Nieder

Hi,

was about to write that we are maybe overly cautious here. Because the
current way a submodule ends up in the list to be pushed is through:

    find_unpushed_submodules()

that itself collects all changed submodules when submodule_needs_pushing() is
true. In there we have this:

    if (!submodule_has_commits(path, commits))
        /*
         * NOTE: We do consider it safe to return "no" here. The
         * correct answer would be "We do not know" instead of
         * "No push needed", but it is quite hard to change
         * the submodule pointer without having the submodule
         * around. If a user did however change the submodules
         * without having the submodule around, this indicates
         * an expert who knows what they are doing or a
         * maintainer integrating work from other people. In
         * both cases it should be safe to skip this check.
         */
        return 0;

So if the check, whether a submodule has commits, fails for any reason it will
not end up in the list to be pushed.

As a side note: inside submodule_has_commits() there is an add_submodule_odb()
followed by a process to really make sure that the commits are in the
submodule.

So IMO at this point we can be sure that the *database* exists and this extra
check could be dropped if we said that a caller to push_submodule() should make
sure that the submodule exists. The current ones are doing it already (if I did
not miss anything).

On Tue, Aug 15, 2017 at 06:05:25PM -0700, Stefan Beller wrote:
> On Tue, Aug 15, 2017 at 5:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Stefan Beller <sbeller@google.com> writes:
> >
> >>> Is "is it populated" a good thing to check here, though?  IIRC,
> >>> add-submodule-odb allows you to add the object database of an
> >>> inactivated submodule, so this seems to change the behaviour.  I do
> >>> not know if the behaviour change is a good thing (i.e. bugfix) or
> >>> not (i.e. regression) offhand, though.
> >>
> >> Good point, we should be able to push non-populated, even inactive(?)
> >> submodules. For that we strictly need add_submodule_odb here
> >> (or the repo object of the submodule, eventually).
> >>
> >> So let's retract this patch for now.
> >
> > Not so fast.
> 
> Ok, I took another look at the code.
> 
> While we may desire that un-populated submodules can be pushed
> (due to checking out another revision where the submodule
> doesn't exist, before pushing), this is not supported currently, because
> the call to run the push in the submodule assumes there is a
> "<path>/.git" on which the child process can operate.
> So for now we HAVE to have the submodule populated.

That is a good point though. In the current form of push_submodule() we need to
have a populated submodule. So IMO to check whether the submodule is actually
*populated* instead of adding the odb is correct and a possible bug fix.

Cheers Heiko

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-08-16 16:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-12 23:45 [PATCH] submodule: use cheaper check for submodule pushes Stefan Beller
2017-07-13  0:01 ` Jonathan Nieder
2017-07-13  0:09   ` Stefan Beller
2017-07-13  0:53   ` Junio C Hamano
2017-07-13  5:14     ` Stefan Beller
2017-07-13 18:37       ` Junio C Hamano
2017-07-13 19:39         ` Stefan Beller
2017-07-13 20:48           ` Jonathan Nieder
2017-07-13 20:54             ` Stefan Beller
2017-08-15 22:43               ` [PATCH] push: do not add submodule odb as an alternate when recursing on demand Stefan Beller
2017-08-15 23:10                 ` Jonathan Nieder
2017-08-15 23:14                 ` Jonathan Nieder
2017-08-15 23:27                   ` Stefan Beller
2017-08-15 23:23                 ` Junio C Hamano
2017-08-15 23:31                   ` Stefan Beller
2017-08-16  0:11                     ` Junio C Hamano
2017-08-16  1:05                       ` Stefan Beller
2017-08-16  2:08                         ` Jonathan Nieder
2017-08-16  5:52                           ` Stefan Beller
2017-08-16 16:35                         ` Heiko Voigt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).