git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Dealing with a lot of submodules
@ 2016-06-09 19:06 Stefan Beller
  2016-06-09 19:06 ` [PATCH 1/2] submodule--helper: initial clone learns retry logic Stefan Beller
  2016-06-09 19:06 ` [PATCH 2/2] submodule update: continue when a clone fails Stefan Beller
  0 siblings, 2 replies; 8+ messages in thread
From: Stefan Beller @ 2016-06-09 19:06 UTC (permalink / raw
  To: gitster; +Cc: git, Jens.Lehmann, Stefan Beller

We have a test repo with about 500 submodules, and I noticed some problems
when cloning this repo.  This is a series that helps dealing with that repo
in two ways:

* When having 500 submodules, you have to perform 500 clones. This makes an
  ephemeral error 500 times more likely. To cope with such errors, just try
  again after all other clones have finished.

* If a recursive clone fails for another reason (in our case a missing
  .gitmodules file), we want to keep going to finish the clone, instead of
  failing.
  
Thanks,
Stefan

Stefan Beller (2):
  submodule--helper: initial clone learns retry logic
  submodule update: continue when a clone fails

 builtin/submodule--helper.c | 44 ++++++++++++++++++++++++++++++++++++--------
 git-submodule.sh            |  2 +-
 2 files changed, 37 insertions(+), 9 deletions(-)

-- 
2.9.0.rc2.368.gdadd65c

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] submodule--helper: initial clone learns retry logic
  2016-06-09 19:06 [PATCH 0/2] Dealing with a lot of submodules Stefan Beller
@ 2016-06-09 19:06 ` Stefan Beller
  2016-06-09 19:19   ` Junio C Hamano
  2016-06-09 19:06 ` [PATCH 2/2] submodule update: continue when a clone fails Stefan Beller
  1 sibling, 1 reply; 8+ messages in thread
From: Stefan Beller @ 2016-06-09 19:06 UTC (permalink / raw
  To: gitster; +Cc: git, Jens.Lehmann, Stefan Beller

Each submodule that is attempted to be cloned, will be retried once in
case of failure after all other submodules were cloned. This helps to
mitigate ephemeral server failures and increases chances of a reliable
clone of a repo with hundreds of submodules immensely.

The choice of the priority queue is a bit miss-leading as we use it as a
standard queue, but the priority queue offers the best API to use.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c7deb55..efb6759 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -12,6 +12,7 @@
 #include "remote.h"
 #include "refs.h"
 #include "connect.h"
+#include "prio-queue.h"
 
 static char *get_default_remote(void)
 {
@@ -568,6 +569,12 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int compare_ce(const void *one, const void *two, void *cb_data)
+{
+	const struct cache_entry *ce_one = one, *ce_two = two;
+	return ce_two - ce_one;
+}
+
 struct submodule_update_clone {
 	/* index into 'list', the list of submodules to look into for cloning */
 	int current;
@@ -590,10 +597,13 @@ struct submodule_update_clone {
 
 	/* If we want to stop as fast as possible and return an error */
 	unsigned quickstop : 1;
+
+	/* Queue of failed clones, containing the cache_entry */
+	struct prio_queue failed_queue;
 };
 #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
 	SUBMODULE_UPDATE_STRATEGY_INIT, 0, -1, NULL, NULL, NULL, NULL, \
-	STRING_LIST_INIT_DUP, 0}
+	STRING_LIST_INIT_DUP, 0, { compare_ce }}
 
 
 static void next_submodule_warn_missing(struct submodule_update_clone *suc,
@@ -718,23 +728,36 @@ cleanup:
 static int update_clone_get_next_task(struct child_process *child,
 				      struct strbuf *err,
 				      void *suc_cb,
-				      void **void_task_cb)
+				      void **ce_task_cb)
 {
 	struct submodule_update_clone *suc = suc_cb;
+	const struct cache_entry *ce;
 
 	for (; suc->current < suc->list.nr; suc->current++) {
-		const struct cache_entry *ce = suc->list.entries[suc->current];
+		ce = suc->list.entries[suc->current];
 		if (prepare_to_clone_next_submodule(ce, child, suc, err)) {
+			*ce_task_cb = (struct cache_entry *) ce;
 			suc->current++;
 			return 1;
 		}
 	}
+	/*
+	 * The loop above tried cloning each submodule once,
+	 * now try the stragglers again.
+	 */
+	ce = (struct cache_entry *) prio_queue_get(&suc->failed_queue);
+	if (ce) {
+		if (!prepare_to_clone_next_submodule(ce, child, suc, err))
+			die("BUG: ce was a submodule before?");
+		return 1;
+	}
+
 	return 0;
 }
 
 static int update_clone_start_failure(struct strbuf *err,
 				      void *suc_cb,
-				      void *void_task_cb)
+				      void *ce_task_cb)
 {
 	struct submodule_update_clone *suc = suc_cb;
 	suc->quickstop = 1;
@@ -744,15 +767,18 @@ static int update_clone_start_failure(struct strbuf *err,
 static int update_clone_task_finished(int result,
 				      struct strbuf *err,
 				      void *suc_cb,
-				      void *void_task_cb)
+				      void *ce_task_cb)
 {
 	struct submodule_update_clone *suc = suc_cb;
+	struct cache_entry *ce = ce_task_cb;
 
 	if (!result)
 		return 0;
 
-	suc->quickstop = 1;
-	return 1;
+	strbuf_addf(err, _("Failed to clone '%s'. Scheduled for retry"),
+			   ce->name);
+	prio_queue_put(&suc->failed_queue, ce);
+	return 0;
 }
 
 static int update_clone(int argc, const char **argv, const char *prefix)
-- 
2.9.0.rc2.368.gdadd65c

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] submodule update: continue when a clone fails
  2016-06-09 19:06 [PATCH 0/2] Dealing with a lot of submodules Stefan Beller
  2016-06-09 19:06 ` [PATCH 1/2] submodule--helper: initial clone learns retry logic Stefan Beller
@ 2016-06-09 19:06 ` Stefan Beller
  1 sibling, 0 replies; 8+ messages in thread
From: Stefan Beller @ 2016-06-09 19:06 UTC (permalink / raw
  To: gitster; +Cc: git, Jens.Lehmann, Stefan Beller

In 15ffb7cde48 (2011-06-13, submodule update: continue when a checkout
fails), we reasoned it is ok to continue, when there is not much of
a mental burden by the failure. If a recursive submodule fails to clone
because a .gitmodules file is broken (e.g. :
fatal: No url found for submodule path 'foo/bar' in .gitmodules
Failed to recurse into submodule path 'foo', signaled by exit code 128),
this is one of the cases where the user is not expected to have much of
a burden afterwards, so we can also continue in that case.

This means we only want to stop for updating submodules in case of rebase,
merge or custom update command failures, which are all signaled with
exit code 2.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index b39ac10..7c62b53 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -705,7 +705,7 @@ cmd_update()
 			if test $res -gt 0
 			then
 				die_msg="$(eval_gettext "Failed to recurse into submodule path '\$displaypath'")"
-				if test $res -eq 1
+				if test $res -ne 2
 				then
 					err="${err};$die_msg"
 					continue
-- 
2.9.0.rc2.368.gdadd65c

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] submodule--helper: initial clone learns retry logic
  2016-06-09 19:06 ` [PATCH 1/2] submodule--helper: initial clone learns retry logic Stefan Beller
@ 2016-06-09 19:19   ` Junio C Hamano
  2016-06-09 19:47     ` Stefan Beller
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2016-06-09 19:19 UTC (permalink / raw
  To: Stefan Beller; +Cc: git, Jens.Lehmann

Stefan Beller <sbeller@google.com> writes:

> +static int compare_ce(const void *one, const void *two, void *cb_data)
> +{
> +	const struct cache_entry *ce_one = one, *ce_two = two;
> +	return ce_two - ce_one;
> +}

This would work in practice, but I suspect that this is not ANSI-C
kosher; as address comparison for ordering (not equality) is
undefined if two pointers are not pointing into the same array or
into the same struct's fields.

I think we have one or two other instances of such fishy pointer
comparison already in the codebase, so it is not a show-stopper, but
it would be better if this can be avoided and be replaced with
something that I do not have to raise eyebrows at ;-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] submodule--helper: initial clone learns retry logic
  2016-06-09 19:19   ` Junio C Hamano
@ 2016-06-09 19:47     ` Stefan Beller
  2016-06-09 19:59       ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Beller @ 2016-06-09 19:47 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Jens Lehmann

On Thu, Jun 9, 2016 at 12:19 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +static int compare_ce(const void *one, const void *two, void *cb_data)
>> +{
>> +     const struct cache_entry *ce_one = one, *ce_two = two;
>> +     return ce_two - ce_one;
>> +}
>
> This would work in practice, but I suspect that this is not ANSI-C
> kosher; as address comparison for ordering (not equality) is
> undefined if two pointers are not pointing into the same array or
> into the same struct's fields.

Uh :(

I thought about that for a while as I was about to send a 'simplification'
patch for commit.c:compare_commits_by_{author,commit}_date
There it made sense to keep it the way it is because
    sizeof(date) > sizeof(return value)

At the time of writing it made sense for this to be a subtractions,
but I think we want to make it

    if (ce_one < ce_two)
        return 1;
    else if (ce_one > ce_two)
        return -1;
    else
        return 0;

instead. But that is still unspecified, so we rather go with

static int compare_ce(const void *one, const void *two, void *cb_data)
{
    const struct cache_entry *ce_one = one, *ce_two = two;
    return strcmp(ce_one->name, ce_two->name);
}

When reading the prio queue code correctly, we can be sure there
is no NULL passed to either of one or two.

>
> I think we have one or two other instances of such fishy pointer
> comparison already in the codebase, so it is not a show-stopper, but
> it would be better if this can be avoided and be replaced with
> something that I do not have to raise eyebrows at ;-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] submodule--helper: initial clone learns retry logic
  2016-06-09 19:47     ` Stefan Beller
@ 2016-06-09 19:59       ` Junio C Hamano
  2016-06-09 20:40         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2016-06-09 19:59 UTC (permalink / raw
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> instead. But that is still unspecified, so we rather go with
>
> static int compare_ce(const void *one, const void *two, void *cb_data)
> {
>     const struct cache_entry *ce_one = one, *ce_two = two;
>     return strcmp(ce_one->name, ce_two->name);
> }

As I said below, I do not think it is worth addressing by making the
code's behaviour on real systems worse.  As long as what you have as
the key into priority queue is a pointer to cache_entry, you cannot
make it better from that point of view.

>> I think we have one or two other instances of such fishy pointer
>> comparison already in the codebase, so it is not a show-stopper, but
>> it would be better if this can be avoided and be replaced with
>> something that I do not have to raise eyebrows at ;-)

If you are using priority queue, it probably would make more sense
to use the "priority" part properly---the paths whose ce happens to
be lower in the address space are not inherently more important and
deserve to be processed sooner.  From "let's retry failed ones after
finishing", I expected that the queue was either fifo (simplest), or
a prio queue whose priority function would give lower priority for
entries with least number of previous attempts (more involved but
probably more fair).

So a proper "fix" probably is either (1) not to use prio queue as
you are not doing anything with priority anyway, or (2) use
something other than ce itself as entries in prio queue, so that the
priority function can be computation that is more meaningful than
"happens to sit in the lower part of the address space".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] submodule--helper: initial clone learns retry logic
  2016-06-09 19:59       ` Junio C Hamano
@ 2016-06-09 20:40         ` Junio C Hamano
  2016-06-09 23:38           ` Stefan Beller
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2016-06-09 20:40 UTC (permalink / raw
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jens Lehmann

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <sbeller@google.com> writes:
>
>> instead. But that is still unspecified, so we rather go with
>>
>> static int compare_ce(const void *one, const void *two, void *cb_data)
>> {
>>     const struct cache_entry *ce_one = one, *ce_two = two;
>>     return strcmp(ce_one->name, ce_two->name);
>> }
>
> As I said below, I do not think it is worth addressing by making the
> code's behaviour on real systems worse.  As long as what you have as
> the key into priority queue is a pointer to cache_entry, you cannot
> make it better from that point of view.

... because having to strcmp() their names would be much more
expensive than the pointer comparison.

However, I think you could use a pointer into a single array as
an element of prio_queue.  I notice here:

 	for (; suc->current < suc->list.nr; suc->current++) {
-		const struct cache_entry *ce = suc->list.entries[suc->current];
+		ce = suc->list.entries[suc->current];
 		if (prepare_to_clone_next_submodule(ce, child, suc, err)) {
+			*ce_task_cb = (struct cache_entry *) ce;
 			suc->current++;
 			return 1;
 		}
 	}
 
that list.entries[] can serve as such an array.  If you pass around
the pointer to its element instead, i.e.

-		ce = suc->list.entries[suc->current];
+		ceP = &suc->list.entries[suc->current];
- 		if (prepare_to_clone_next_submodule(ce, child, suc, err)) {
+ 		if (prepare_to_clone_next_submodule(*ceP, child, suc, err)) {
+			*ce_task_cb = (struct cache_entry *) ce;
-			*ce_task_cb = ceP;
		...
	}
	/*
	 * The loop above tried cloning each submodule once,
	 * now try the stragglers again.
	 */
-	ce = (struct cache_entry *) prio_queue_get(&suc->failed_queue);
+	ceP = (struct cache_entry **) prio_queue_get(&suc->failed_queue);

then the elements you are pushing into prio-queue would not be ce
(pointer to a cache entry), but would be a pointer of an array that
holds many pointers to cache entries, so it becomes kosher to
compare them for ordering.

That would probably not add too much overhead at runtime; it may
have to involve a bit of code restructuring, so I do not know if it
is worth it, though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] submodule--helper: initial clone learns retry logic
  2016-06-09 20:40         ` Junio C Hamano
@ 2016-06-09 23:38           ` Stefan Beller
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Beller @ 2016-06-09 23:38 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Jens Lehmann

On Thu, Jun 9, 2016 at 1:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Stefan Beller <sbeller@google.com> writes:
>>
>>> instead. But that is still unspecified, so we rather go with
>>>
>>> static int compare_ce(const void *one, const void *two, void *cb_data)
>>> {
>>>     const struct cache_entry *ce_one = one, *ce_two = two;
>>>     return strcmp(ce_one->name, ce_two->name);
>>> }
>>
>> As I said below, I do not think it is worth addressing by making the
>> code's behaviour on real systems worse.  As long as what you have as
>> the key into priority queue is a pointer to cache_entry, you cannot
>> make it better from that point of view.
>
> ... because having to strcmp() their names would be much more
> expensive than the pointer comparison.
>
> However, I think you could use a pointer into a single array as
> an element of prio_queue.  I notice here:
>
>         for (; suc->current < suc->list.nr; suc->current++) {
> -               const struct cache_entry *ce = suc->list.entries[suc->current];
> +               ce = suc->list.entries[suc->current];
>                 if (prepare_to_clone_next_submodule(ce, child, suc, err)) {
> +                       *ce_task_cb = (struct cache_entry *) ce;
>                         suc->current++;
>                         return 1;
>                 }
>         }
>
> that list.entries[] can serve as such an array.  If you pass around
> the pointer to its element instead, i.e.
>
> -               ce = suc->list.entries[suc->current];
> +               ceP = &suc->list.entries[suc->current];
> -               if (prepare_to_clone_next_submodule(ce, child, suc, err)) {
> +               if (prepare_to_clone_next_submodule(*ceP, child, suc, err)) {
> +                       *ce_task_cb = (struct cache_entry *) ce;
> -                       *ce_task_cb = ceP;
>                 ...
>         }
>         /*
>          * The loop above tried cloning each submodule once,
>          * now try the stragglers again.
>          */
> -       ce = (struct cache_entry *) prio_queue_get(&suc->failed_queue);
> +       ceP = (struct cache_entry **) prio_queue_get(&suc->failed_queue);
>
> then the elements you are pushing into prio-queue would not be ce
> (pointer to a cache entry), but would be a pointer of an array that
> holds many pointers to cache entries, so it becomes kosher to
> compare them for ordering.
>
> That would probably not add too much overhead at runtime; it may
> have to involve a bit of code restructuring, so I do not know if it
> is worth it, though.
>

I realized the patch above was bogus for another reason:
In update_clone_task_finished we never distinguish between first-failing
submodules and repeated fails, such that we may end up in an infinite loop.

So I think the easiest way forward would be if we would pass around the index
and then we can properly determine if we tried that already. (Though casting
pointer to int is not possible, so we'll pass around a pointer to an int)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-09 23:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-09 19:06 [PATCH 0/2] Dealing with a lot of submodules Stefan Beller
2016-06-09 19:06 ` [PATCH 1/2] submodule--helper: initial clone learns retry logic Stefan Beller
2016-06-09 19:19   ` Junio C Hamano
2016-06-09 19:47     ` Stefan Beller
2016-06-09 19:59       ` Junio C Hamano
2016-06-09 20:40         ` Junio C Hamano
2016-06-09 23:38           ` Stefan Beller
2016-06-09 19:06 ` [PATCH 2/2] submodule update: continue when a clone fails Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).