git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/3] Speedup finding of unpushed submodules
@ 2016-10-07 15:06 Heiko Voigt
  2016-10-07 15:06 ` [PATCH v2 1/3] serialize collection of changed submodules Heiko Voigt
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Heiko Voigt @ 2016-10-07 15:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Heiko Voigt, Jeff King, Stefan Beller, git, Jens.Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

You can find the first iteration of this series as part of this thread:

http://public-inbox.org/git/%3C20160914173124.GA7613@sandbox%3E/

All mentioned issues should be fixed. I dropped the last patch which was
the cause of the broken tests.

This should optimize every part of this test to a nice speed if you are
pushing to a remote. The only case that is still broken/slow as hell is
when calling push with a direct url.

I am thinking whether we should maybe error out with a "not implemented"
message or something and mention that --recurse-submoules does not work
with direct urls? But we might want to have another look at performance
with this patch included. Maybe it is actually useable with the last
patch included which was not yet on pu.

Cheers Heiko

Heiko Voigt (3):
  serialize collection of changed submodules
  serialize collection of refs that contain submodule changes
  batch check whether submodule needs pushing into one call

 submodule.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++--------------
 submodule.h |   5 +--
 transport.c |  29 ++++++++++-----
 3 files changed, 114 insertions(+), 36 deletions(-)

-- 
2.10.1.637.g09b28c5


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/3] serialize collection of changed submodules
  2016-10-07 15:06 [PATCH v2 0/3] Speedup finding of unpushed submodules Heiko Voigt
@ 2016-10-07 15:06 ` Heiko Voigt
  2016-10-07 17:59   ` Stefan Beller
  2016-10-07 15:06 ` [PATCH v2 2/3] serialize collection of refs that contain submodule changes Heiko Voigt
  2016-10-07 15:06 ` [PATCH v2 3/3] batch check whether submodule needs pushing into one call Heiko Voigt
  2 siblings, 1 reply; 19+ messages in thread
From: Heiko Voigt @ 2016-10-07 15:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Heiko Voigt, Jeff King, Stefan Beller, git, Jens.Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
 submodule.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/submodule.c b/submodule.c
index 2de06a3351..59c9d15905 100644
--- a/submodule.c
+++ b/submodule.c
@@ -554,19 +554,34 @@ static int submodule_needs_pushing(const char *path, const unsigned char sha1[20
 	return 0;
 }
 
+static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
+		const char *path)
+{
+	struct string_list_item *item;
+
+	item = string_list_insert(submodules, path);
+	if (item->util)
+		return (struct sha1_array *) item->util;
+
+	/* NEEDSWORK: should we have sha1_array_init()? */
+	item->util = xcalloc(1, sizeof(struct sha1_array));
+	return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 					 struct diff_options *options,
 					 void *data)
 {
 	int i;
-	struct string_list *needs_pushing = data;
+	struct string_list *submodules = data;
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
+		struct sha1_array *hashes;
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
-		if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-			string_list_insert(needs_pushing, p->two->path);
+		hashes = get_sha1s_from_list(submodules, p->two->path);
+		sha1_array_append(hashes, p->two->oid.hash);
 	}
 }
 
@@ -582,14 +597,41 @@ static void find_unpushed_submodule_commits(struct commit *commit,
 	diff_tree_combined_merge(commit, 1, &rev);
 }
 
+struct collect_submodule_from_sha1s_data {
+	char *submodule_path;
+	struct string_list *needs_pushing;
+};
+
+static void collect_submodules_from_sha1s(const unsigned char sha1[20],
+		void *data)
+{
+	struct collect_submodule_from_sha1s_data *me =
+		(struct collect_submodule_from_sha1s_data *) data;
+
+	if (submodule_needs_pushing(me->submodule_path, sha1))
+		string_list_insert(me->needs_pushing, me->submodule_path);
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+	int i;
+	for (i = 0; i < submodules->nr; i++) {
+		struct string_list_item *item = &submodules->items[i];
+		struct sha1_array *hashes = (struct sha1_array *) item->util;
+		sha1_array_clear(hashes);
+	}
+	string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
 		const char *remotes_name, struct string_list *needs_pushing)
 {
 	struct rev_info rev;
 	struct commit *commit;
 	const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-	int argc = ARRAY_SIZE(argv) - 1;
+	int argc = ARRAY_SIZE(argv) - 1, i;
 	char *sha1_copy;
+	struct string_list submodules = STRING_LIST_INIT_DUP;
 
 	struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
 		die("revision walk setup failed");
 
 	while ((commit = get_revision(&rev)) != NULL)
-		find_unpushed_submodule_commits(commit, needs_pushing);
+		find_unpushed_submodule_commits(commit, &submodules);
 
 	reset_revision_walk();
 	free(sha1_copy);
 	strbuf_release(&remotes_arg);
 
+	for (i = 0; i < submodules.nr; i++) {
+		struct string_list_item *item = &submodules.items[i];
+		struct collect_submodule_from_sha1s_data data;
+		data.submodule_path = item->string;
+		data.needs_pushing = needs_pushing;
+		sha1_array_for_each_unique((struct sha1_array *) item->util,
+				collect_submodules_from_sha1s,
+				&data);
+	}
+	free_submodules_sha1s(&submodules);
+
 	return needs_pushing->nr;
 }
 
-- 
2.10.1.637.g09b28c5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/3] serialize collection of refs that contain submodule changes
  2016-10-07 15:06 [PATCH v2 0/3] Speedup finding of unpushed submodules Heiko Voigt
  2016-10-07 15:06 ` [PATCH v2 1/3] serialize collection of changed submodules Heiko Voigt
@ 2016-10-07 15:06 ` Heiko Voigt
  2016-10-07 18:16   ` Stefan Beller
  2016-10-10 22:48   ` Junio C Hamano
  2016-10-07 15:06 ` [PATCH v2 3/3] batch check whether submodule needs pushing into one call Heiko Voigt
  2 siblings, 2 replies; 19+ messages in thread
From: Heiko Voigt @ 2016-10-07 15:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Heiko Voigt, Jeff King, Stefan Beller, git, Jens.Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
 submodule.c | 36 +++++++++++++++++++++---------------
 submodule.h |  5 +++--
 transport.c | 29 +++++++++++++++++++++--------
 3 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index 59c9d15905..5044afc2f8 100644
--- a/submodule.c
+++ b/submodule.c
@@ -522,6 +522,13 @@ static int has_remote(const char *refname, const struct object_id *oid,
 	return 1;
 }
 
+static int append_hash_to_argv(const unsigned char sha1[20], void *data)
+{
+	struct argv_array *argv = (struct argv_array *) data;
+	argv_array_push(argv, sha1_to_hex(sha1));
+	return 0;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned char sha1[20])
 {
 	if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
@@ -623,24 +630,24 @@ static void free_submodules_sha1s(struct string_list *submodules)
 	string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+int find_unpushed_submodules(struct sha1_array *hashes,
 		const char *remotes_name, struct string_list *needs_pushing)
 {
 	struct rev_info rev;
 	struct commit *commit;
-	const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-	int argc = ARRAY_SIZE(argv) - 1, i;
-	char *sha1_copy;
+	int i;
 	struct string_list submodules = STRING_LIST_INIT_DUP;
+	struct argv_array argv = ARGV_ARRAY_INIT;
 
-	struct strbuf remotes_arg = STRBUF_INIT;
-
-	strbuf_addf(&remotes_arg, "--remotes=%s", remotes_name);
 	init_revisions(&rev, NULL);
-	sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-	argv[1] = sha1_copy;
-	argv[3] = remotes_arg.buf;
-	setup_revisions(argc, argv, &rev, NULL);
+
+	/* argv.argv[0] will be ignored by setup_revisions */
+	argv_array_push(&argv, "find_unpushed_submodules");
+	sha1_array_for_each_unique(hashes, append_hash_to_argv, &argv);
+	argv_array_push(&argv, "--not");
+	argv_array_pushf(&argv, "--remotes=%s", remotes_name);
+
+	setup_revisions(argv.argc, argv.argv, &rev, NULL);
 	if (prepare_revision_walk(&rev))
 		die("revision walk setup failed");
 
@@ -648,8 +655,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
 		find_unpushed_submodule_commits(commit, &submodules);
 
 	reset_revision_walk();
-	free(sha1_copy);
-	strbuf_release(&remotes_arg);
+	argv_array_clear(&argv);
 
 	for (i = 0; i < submodules.nr; i++) {
 		struct string_list_item *item = &submodules.items[i];
@@ -687,12 +693,12 @@ static int push_submodule(const char *path)
 	return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name)
+int push_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name)
 {
 	int i, ret = 1;
 	struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-	if (!find_unpushed_submodules(new_sha1, remotes_name, &needs_pushing))
+	if (!find_unpushed_submodules(hashes, remotes_name, &needs_pushing))
 		return 1;
 
 	for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a948..065b2f0a2a 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
 	RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned char base[20],
 		    const unsigned char a[20], const unsigned char b[20], int search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name,
+int find_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name,
 		struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name);
+int push_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/transport.c b/transport.c
index 94d6dc3725..05f2ce83f1 100644
--- a/transport.c
+++ b/transport.c
@@ -903,23 +903,36 @@ int transport_push(struct transport *transport,
 
 		if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && !is_bare_repository()) {
 			struct ref *ref = remote_refs;
+			struct sha1_array hashes = SHA1_ARRAY_INIT;
+
 			for (; ref; ref = ref->next)
-				if (!is_null_oid(&ref->new_oid) &&
-				    !push_unpushed_submodules(ref->new_oid.hash,
-					    transport->remote->name))
-				    die ("Failed to push all needed submodules!");
+				if (!is_null_oid(&ref->new_oid))
+					sha1_array_append(&hashes, ref->new_oid.hash);
+
+			if (!push_unpushed_submodules(&hashes, transport->remote->name)) {
+				sha1_array_clear(&hashes);
+				die ("Failed to push all needed submodules!");
+			}
+			sha1_array_clear(&hashes);
 		}
 
 		if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND |
 			      TRANSPORT_RECURSE_SUBMODULES_CHECK)) && !is_bare_repository()) {
 			struct ref *ref = remote_refs;
 			struct string_list needs_pushing = STRING_LIST_INIT_DUP;
+			struct sha1_array hashes = SHA1_ARRAY_INIT;
 
 			for (; ref; ref = ref->next)
-				if (!is_null_oid(&ref->new_oid) &&
-				    find_unpushed_submodules(ref->new_oid.hash,
-					    transport->remote->name, &needs_pushing))
-					die_with_unpushed_submodules(&needs_pushing);
+				if (!is_null_oid(&ref->new_oid))
+					sha1_array_append(&hashes, ref->new_oid.hash);
+
+			if (find_unpushed_submodules(&hashes, transport->remote->name,
+						&needs_pushing)) {
+				sha1_array_clear(&hashes);
+				die_with_unpushed_submodules(&needs_pushing);
+			}
+			string_list_clear(&needs_pushing, 0);
+			sha1_array_clear(&hashes);
 		}
 
 		push_ret = transport->push_refs(transport, remote_refs, flags);
-- 
2.10.1.637.g09b28c5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/3] batch check whether submodule needs pushing into one call
  2016-10-07 15:06 [PATCH v2 0/3] Speedup finding of unpushed submodules Heiko Voigt
  2016-10-07 15:06 ` [PATCH v2 1/3] serialize collection of changed submodules Heiko Voigt
  2016-10-07 15:06 ` [PATCH v2 2/3] serialize collection of refs that contain submodule changes Heiko Voigt
@ 2016-10-07 15:06 ` Heiko Voigt
  2016-10-07 18:30   ` Stefan Beller
  2016-10-10 22:56   ` Junio C Hamano
  2 siblings, 2 replies; 19+ messages in thread
From: Heiko Voigt @ 2016-10-07 15:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Heiko Voigt, Jeff King, Stefan Beller, git, Jens.Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
 submodule.c | 63 +++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5044afc2f8..a05c2a34b1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -529,27 +529,49 @@ static int append_hash_to_argv(const unsigned char sha1[20], void *data)
 	return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char sha1[20])
+static int check_has_hash(const unsigned char sha1[20], void *data)
 {
-	if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+	int *has_hash = (int *) data;
+
+	if (!lookup_commit_reference(sha1))
+		*has_hash = 0;
+
+	return 0;
+}
+
+static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
+{
+	int has_hash = 1;
+
+	if (add_submodule_odb(path))
+		return 0;
+
+	sha1_array_for_each_unique(hashes, check_has_hash, &has_hash);
+	return has_hash;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array *hashes)
+{
+	if (!submodule_has_hashes(path, hashes))
 		return 0;
 
 	if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
 		struct child_process cp = CHILD_PROCESS_INIT;
-		const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL};
 		struct strbuf buf = STRBUF_INIT;
 		int needs_pushing = 0;
 
-		argv[1] = sha1_to_hex(sha1);
-		cp.argv = argv;
+		argv_array_push(&cp.args, "rev-list");
+		sha1_array_for_each_unique(hashes, append_hash_to_argv, &cp.args);
+		argv_array_pushl(&cp.args, "--not", "--remotes", "-n", "1" , NULL);
+
 		prepare_submodule_repo_env(&cp.env_array);
 		cp.git_cmd = 1;
 		cp.no_stdin = 1;
 		cp.out = -1;
 		cp.dir = path;
 		if (start_command(&cp))
-			die("Could not run 'git rev-list %s --not --remotes -n 1' command in submodule %s",
-				sha1_to_hex(sha1), path);
+			die("Could not run 'git rev-list <hashes> --not --remotes -n 1' command in submodule %s",
+					path);
 		if (strbuf_read(&buf, cp.out, 41))
 			needs_pushing = 1;
 		finish_command(&cp);
@@ -604,21 +626,6 @@ static void find_unpushed_submodule_commits(struct commit *commit,
 	diff_tree_combined_merge(commit, 1, &rev);
 }
 
-struct collect_submodule_from_sha1s_data {
-	char *submodule_path;
-	struct string_list *needs_pushing;
-};
-
-static void collect_submodules_from_sha1s(const unsigned char sha1[20],
-		void *data)
-{
-	struct collect_submodule_from_sha1s_data *me =
-		(struct collect_submodule_from_sha1s_data *) data;
-
-	if (submodule_needs_pushing(me->submodule_path, sha1))
-		string_list_insert(me->needs_pushing, me->submodule_path);
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
 	int i;
@@ -658,13 +665,11 @@ int find_unpushed_submodules(struct sha1_array *hashes,
 	argv_array_clear(&argv);
 
 	for (i = 0; i < submodules.nr; i++) {
-		struct string_list_item *item = &submodules.items[i];
-		struct collect_submodule_from_sha1s_data data;
-		data.submodule_path = item->string;
-		data.needs_pushing = needs_pushing;
-		sha1_array_for_each_unique((struct sha1_array *) item->util,
-				collect_submodules_from_sha1s,
-				&data);
+		struct string_list_item *submodule = &submodules.items[i];
+		struct sha1_array *hashes = (struct sha1_array *) submodule->util;
+
+		if (submodule_needs_pushing(submodule->string, hashes))
+			string_list_insert(needs_pushing, submodule->string);
 	}
 	free_submodules_sha1s(&submodules);
 
-- 
2.10.1.637.g09b28c5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] serialize collection of changed submodules
  2016-10-07 15:06 ` [PATCH v2 1/3] serialize collection of changed submodules Heiko Voigt
@ 2016-10-07 17:59   ` Stefan Beller
  2016-10-10 22:43     ` Junio C Hamano
  2016-10-12 13:11     ` Heiko Voigt
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Beller @ 2016-10-07 17:59 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Junio C Hamano, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> To check whether a submodule needs to be pushed we need to collect all
> changed submodules. Lets collect them first and then execute the
> possibly expensive test whether certain revisions are already pushed
> only once per submodule.
>
> There is further potential for optimization since we can assemble one
> command and only issued that instead of one call for each remote ref in
> the submodule.
>
> Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
> ---
>  submodule.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 58 insertions(+), 5 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 2de06a3351..59c9d15905 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -554,19 +554,34 @@ static int submodule_needs_pushing(const char *path, const unsigned char sha1[20
>         return 0;
>  }
>
> +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
> +               const char *path)

So this will take the stringlist `submodules` and insert the path into it,
if it wasn't already in there. In case it is newly inserted, add a sha1_array
as util, so each inserted path has it's own empty array.

So it is both init of the data structures as well as retrieving them. I was
initially confused by the name as I assumed it would give you sha1s out
of a string list (e.g. transform strings to internal sha1 things).
Maybe it's just
me having a hard time to understand that, but I feel like the name could be
improved.

    lookup_sha1_list_by_path,
    insert_path_and_return_sha1_list ?

> +{
> +       struct string_list_item *item;
> +
> +       item = string_list_insert(submodules, path);
> +       if (item->util)
> +               return (struct sha1_array *) item->util;
> +
> +       /* NEEDSWORK: should we have sha1_array_init()? */
> +       item->util = xcalloc(1, sizeof(struct sha1_array));
> +       return (struct sha1_array *) item->util;
> +}
> +
>  static void collect_submodules_from_diff(struct diff_queue_struct *q,
>                                          struct diff_options *options,
>                                          void *data)
>  {
>         int i;
> -       struct string_list *needs_pushing = data;
> +       struct string_list *submodules = data;
>
>         for (i = 0; i < q->nr; i++) {
>                 struct diff_filepair *p = q->queue[i];
> +               struct sha1_array *hashes;
>                 if (!S_ISGITLINK(p->two->mode))
>                         continue;
> -               if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
> -                       string_list_insert(needs_pushing, p->two->path);
> +               hashes = get_sha1s_from_list(submodules, p->two->path);
> +               sha1_array_append(hashes, p->two->oid.hash);
>         }
>  }
>
> @@ -582,14 +597,41 @@ static void find_unpushed_submodule_commits(struct commit *commit,
>         diff_tree_combined_merge(commit, 1, &rev);
>  }
>
> +struct collect_submodule_from_sha1s_data {
> +       char *submodule_path;
> +       struct string_list *needs_pushing;
> +};
> +
> +static void collect_submodules_from_sha1s(const unsigned char sha1[20],
> +               void *data)
> +{
> +       struct collect_submodule_from_sha1s_data *me =
> +               (struct collect_submodule_from_sha1s_data *) data;
> +
> +       if (submodule_needs_pushing(me->submodule_path, sha1))
> +               string_list_insert(me->needs_pushing, me->submodule_path);
> +}
> +
> +static void free_submodules_sha1s(struct string_list *submodules)
> +{
> +       int i;
> +       for (i = 0; i < submodules->nr; i++) {
> +               struct string_list_item *item = &submodules->items[i];

You do not seem to make use of `i` explicitely, so
for_each_string_list_item might be more readable here?


> +               struct sha1_array *hashes = (struct sha1_array *) item->util;
> +               sha1_array_clear(hashes);
> +       }
> +       string_list_clear(submodules, 1);
> +}
> +
>  int find_unpushed_submodules(unsigned char new_sha1[20],
>                 const char *remotes_name, struct string_list *needs_pushing)
>  {
>         struct rev_info rev;
>         struct commit *commit;
>         const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
> -       int argc = ARRAY_SIZE(argv) - 1;
> +       int argc = ARRAY_SIZE(argv) - 1, i;
>         char *sha1_copy;
> +       struct string_list submodules = STRING_LIST_INIT_DUP;
>
>         struct strbuf remotes_arg = STRBUF_INIT;
>
> @@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
>                 die("revision walk setup failed");
>
>         while ((commit = get_revision(&rev)) != NULL)
> -               find_unpushed_submodule_commits(commit, needs_pushing);
> +               find_unpushed_submodule_commits(commit, &submodules);
>
>         reset_revision_walk();
>         free(sha1_copy);
>         strbuf_release(&remotes_arg);
>
> +       for (i = 0; i < submodules.nr; i++) {
> +               struct string_list_item *item = &submodules.items[i];

You do not seem to make use of `i` explicitely, so
for_each_string_list_item might be more readable here?


> +               struct collect_submodule_from_sha1s_data data;
> +               data.submodule_path = item->string;
> +               data.needs_pushing = needs_pushing;
> +               sha1_array_for_each_unique((struct sha1_array *) item->util,
> +                               collect_submodules_from_sha1s,
> +                               &data);
> +       }
> +       free_submodules_sha1s(&submodules);
> +
>         return needs_pushing->nr;
>  }
>
> --
> 2.10.1.637.g09b28c5
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] serialize collection of refs that contain submodule changes
  2016-10-07 15:06 ` [PATCH v2 2/3] serialize collection of refs that contain submodule changes Heiko Voigt
@ 2016-10-07 18:16   ` Stefan Beller
  2016-10-12 13:10     ` Heiko Voigt
  2016-10-10 22:48   ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2016-10-07 18:16 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Junio C Hamano, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> We are iterating over each pushed ref and want to check whether it
> contains changes to submodules. Instead of immediately checking each ref
> lets first collect them and then do the check for all of them in one
> revision walk.
>
> Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
> ---
>  submodule.c | 36 +++++++++++++++++++++---------------
>  submodule.h |  5 +++--
>  transport.c | 29 +++++++++++++++++++++--------
>  3 files changed, 45 insertions(+), 25 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 59c9d15905..5044afc2f8 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -522,6 +522,13 @@ static int has_remote(const char *refname, const struct object_id *oid,
>         return 1;
>  }
>
> +static int append_hash_to_argv(const unsigned char sha1[20], void *data)
> +{
> +       struct argv_array *argv = (struct argv_array *) data;
> +       argv_array_push(argv, sha1_to_hex(sha1));

Nit of the day:
When using the struct child-process, we have the oldstyle argv NULL
terminated array as
well as the new style args argv_array. So in that context we'd prefer
`args` as a name for
argv_array as that helps to distinguish from the old array type.
Here however `argv` seems to be a reasonable name, in fact whenever we
do not deal with
child processes, we seem to not like the `args` name:

    $ git grep argv_array |wc -l
    577
    $ git grep argv_array |grep args |wc -l
    293

The rest looks good to me. :)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
  2016-10-07 15:06 ` [PATCH v2 3/3] batch check whether submodule needs pushing into one call Heiko Voigt
@ 2016-10-07 18:30   ` Stefan Beller
  2016-10-10 22:56   ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2016-10-07 18:30 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Junio C Hamano, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> We run a command for each sha1 change in a submodule. This is
> unnecessary since we can simply batch all sha1's we want to check into
> one command. Lets do it so we can speedup the check when many submodule
> changes are in need of checking.
>
> Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
> ---
>  submodule.c | 63 +++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 34 insertions(+), 29 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 5044afc2f8..a05c2a34b1 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -529,27 +529,49 @@ static int append_hash_to_argv(const unsigned char sha1[20], void *data)
>         return 0;
>  }
>
> -static int submodule_needs_pushing(const char *path, const unsigned char sha1[20])
> +static int check_has_hash(const unsigned char sha1[20], void *data)
>  {
> -       if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> +       int *has_hash = (int *) data;
> +
> +       if (!lookup_commit_reference(sha1))
> +               *has_hash = 0;
> +
> +       return 0;
> +}
> +
> +static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
> +{
> +       int has_hash = 1;
> +
> +       if (add_submodule_odb(path))
> +               return 0;
> +
> +       sha1_array_for_each_unique(hashes, check_has_hash, &has_hash);
> +       return has_hash;
> +}
> +
> +static int submodule_needs_pushing(const char *path, struct sha1_array *hashes)
> +{
> +       if (!submodule_has_hashes(path, hashes))

So the above is an implicit lookup already, but we did that before,
too, so it's fine.

> @@ -658,13 +665,11 @@ int find_unpushed_submodules(struct sha1_array *hashes,
>         argv_array_clear(&argv);
>
>         for (i = 0; i < submodules.nr; i++) {
> -               struct string_list_item *item = &submodules.items[i];
> -               struct collect_submodule_from_sha1s_data data;
> -               data.submodule_path = item->string;
> -               data.needs_pushing = needs_pushing;
> -               sha1_array_for_each_unique((struct sha1_array *) item->util,
> -                               collect_submodules_from_sha1s,
> -                               &data);
> +               struct string_list_item *submodule = &submodules.items[i];
> +               struct sha1_array *hashes = (struct sha1_array *) submodule->util;
> +
> +               if (submodule_needs_pushing(submodule->string, hashes))
> +                       string_list_insert(needs_pushing, submodule->string);

That makes sense.

Thanks!
Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] serialize collection of changed submodules
  2016-10-07 17:59   ` Stefan Beller
@ 2016-10-10 22:43     ` Junio C Hamano
  2016-10-12 13:00       ` Heiko Voigt
  2016-10-12 13:11     ` Heiko Voigt
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-10-10 22:43 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Heiko Voigt, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

Stefan Beller <sbeller@google.com> writes:

>> +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
>> +               const char *path)
>
> So this will take the stringlist `submodules` and insert the path into it,
> if it wasn't already in there. In case it is newly inserted, add a sha1_array
> as util, so each inserted path has it's own empty array.
>
> So it is both init of the data structures as well as retrieving them. I was
> initially confused by the name as I assumed it would give you sha1s out
> of a string list (e.g. transform strings to internal sha1 things).
> Maybe it's just
> me having a hard time to understand that, but I feel like the name could be
> improved.
>
>     lookup_sha1_list_by_path,
>     insert_path_and_return_sha1_list ?

I do not think either the name or the "find if exists otherwise
initialize one" behaviour is particularly confusing, but I do not
think "maintain a set of sha1_arrays keyed with a string" is a so
widely reusable general concept/construct.  As can be seen easily in
the names of parameters, this function is about maintaining a set of
sha1_arrays keyed by paths to submodules, and I also assume that the
array indexed by path is not meant to be a general purpose "we can
use it to store any 40-hex thing" but to store something specific.

What is that specific thing?  The names of commit objects in the
submodule repository?

I'd prefer to see that exact thing used to construct the function
name for a helper function with specific usage in mind, i.e.
get_commit_object_names_for_submodule_path() or something along that
line.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] serialize collection of refs that contain submodule changes
  2016-10-07 15:06 ` [PATCH v2 2/3] serialize collection of refs that contain submodule changes Heiko Voigt
  2016-10-07 18:16   ` Stefan Beller
@ 2016-10-10 22:48   ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2016-10-10 22:48 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Jeff King, Stefan Beller, git, Jens.Lehmann, Fredrik Gustafsson,
	Leandro Lucarella

Heiko Voigt <hvoigt@hvoigt.net> writes:

> +static int append_hash_to_argv(const unsigned char sha1[20], void *data)
> +{
> +	struct argv_array *argv = (struct argv_array *) data;
> +	argv_array_push(argv, sha1_to_hex(sha1));
> +	return 0;
> +}

Do we have struct object_id readily available in the caller?

    ... goes and looks ...

No, this is part of sha1_array API, so this callback is perfectly
fine.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
  2016-10-07 15:06 ` [PATCH v2 3/3] batch check whether submodule needs pushing into one call Heiko Voigt
  2016-10-07 18:30   ` Stefan Beller
@ 2016-10-10 22:56   ` Junio C Hamano
  2016-10-12 13:33     ` Heiko Voigt
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-10-10 22:56 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Jeff King, Stefan Beller, git, Jens.Lehmann, Fredrik Gustafsson,
	Leandro Lucarella

Heiko Voigt <hvoigt@hvoigt.net> writes:

> -static int submodule_needs_pushing(const char *path, const unsigned char sha1[20])
> +static int check_has_hash(const unsigned char sha1[20], void *data)
>  {
> -	if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> +	int *has_hash = (int *) data;
> +
> +	if (!lookup_commit_reference(sha1))
> +		*has_hash = 0;
> +
> +	return 0;
> +}
> +
> +static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
> +{
> +	int has_hash = 1;
> +
> +	if (add_submodule_odb(path))
> +		return 0;
> +
> +	sha1_array_for_each_unique(hashes, check_has_hash, &has_hash);
> +	return has_hash;
> +}
> +
> +static int submodule_needs_pushing(const char *path, struct sha1_array *hashes)
> +{
> +	if (!submodule_has_hashes(path, hashes))
>  		return 0;

Same comment about naming.  

What do check-has-hash and submodule-has-hashes exactly mean by
"hash" in their names?  Because I think what is checked here is
"does the local submodule repository have _all_ the commits
referenced from the superproject commit we are pushing?", so I'd
prefer to see "commit" in their names.

If we do not even have these commits locally, then there is no point
attempting to push, so returning 0 (i.e. it is not "needs pushing"
situation) is correct but it is a but subtle.  It's not "we know
they already have them", but it is "even if we tried to push, it
won't do us or the other side any good."  A single-liner in-code
comment may help.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] serialize collection of changed submodules
  2016-10-10 22:43     ` Junio C Hamano
@ 2016-10-12 13:00       ` Heiko Voigt
  2016-10-12 17:18         ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Heiko Voigt @ 2016-10-12 13:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

On Mon, Oct 10, 2016 at 03:43:13PM -0700, Junio C Hamano wrote:
> Stefan Beller <sbeller@google.com> writes:
> 
> >> +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
> >> +               const char *path)
> >
> > So this will take the stringlist `submodules` and insert the path into it,
> > if it wasn't already in there. In case it is newly inserted, add a sha1_array
> > as util, so each inserted path has it's own empty array.
> >
> > So it is both init of the data structures as well as retrieving them. I was
> > initially confused by the name as I assumed it would give you sha1s out
> > of a string list (e.g. transform strings to internal sha1 things).
> > Maybe it's just
> > me having a hard time to understand that, but I feel like the name could be
> > improved.
> >
> >     lookup_sha1_list_by_path,
> >     insert_path_and_return_sha1_list ?
> 
> I do not think either the name or the "find if exists otherwise
> initialize one" behaviour is particularly confusing, but I do not
> think "maintain a set of sha1_arrays keyed with a string" is a so
> widely reusable general concept/construct.  As can be seen easily in
> the names of parameters, this function is about maintaining a set of
> sha1_arrays keyed by paths to submodules, and I also assume that the
> array indexed by path is not meant to be a general purpose "we can
> use it to store any 40-hex thing" but to store something specific.
> 
> What is that specific thing?  The names of commit objects in the
> submodule repository?
> 
> I'd prefer to see that exact thing used to construct the function
> name for a helper function with specific usage in mind, i.e.
> get_commit_object_names_for_submodule_path() or something along that
> line.

I did not name this function too precisely to keep it's name short since
everything specific was quite long, like the suggestion from Junio.

Since this is a static function local to the submodule file I was
assuming anyone interested would just look up the usage and immediately
see the purpose. If I look into submodule-cache.c where I have a similar
functionality we used 'lookup_or_create' for this create on demand
functionality. So a function name would be:

	lookup_or_create_commit_objects_for_submodule_path(...

Which seems quite extensively long for a static function so how about
we shorten it a bit and add a comment:

	/* lookup or create commit object list for submodule */
	get_commit_objects_for_submodule_path(...

?

Cheers Heiko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] serialize collection of refs that contain submodule changes
  2016-10-07 18:16   ` Stefan Beller
@ 2016-10-12 13:10     ` Heiko Voigt
  2016-10-20 23:00       ` Stefan Beller
  0 siblings, 1 reply; 19+ messages in thread
From: Heiko Voigt @ 2016-10-12 13:10 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

On Fri, Oct 07, 2016 at 11:16:31AM -0700, Stefan Beller wrote:
> > diff --git a/submodule.c b/submodule.c
> > index 59c9d15905..5044afc2f8 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -522,6 +522,13 @@ static int has_remote(const char *refname, const struct object_id *oid,
> >         return 1;
> >  }
> >
> > +static int append_hash_to_argv(const unsigned char sha1[20], void *data)
> > +{
> > +       struct argv_array *argv = (struct argv_array *) data;
> > +       argv_array_push(argv, sha1_to_hex(sha1));
> 
> Nit of the day:
> When using the struct child-process, we have the oldstyle argv NULL
> terminated array as
> well as the new style args argv_array. So in that context we'd prefer
> `args` as a name for
> argv_array as that helps to distinguish from the old array type.
> Here however `argv` seems to be a reasonable name, in fact whenever we
> do not deal with
> child processes, we seem to not like the `args` name:
> 
>     $ git grep argv_array |wc -l
>     577
>     $ git grep argv_array |grep args |wc -l
>     293
> 
> The rest looks good to me. :)

Thanks. So I do not completely get what you are suggesting: args or kept
it the way it is? Since in the end you are saying it is ok here ;) I
mainly chose this name because I am substituting the argv variable which
is already called 'argv' with this array. That might also be the reason
why in so many locations with struct child_processe's we have the 'argv'
name: Because they initially started with the old-style NULL terminated
array.

I am fine with it either way. Just tell me what you like :)

Cheers Heiko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] serialize collection of changed submodules
  2016-10-07 17:59   ` Stefan Beller
  2016-10-10 22:43     ` Junio C Hamano
@ 2016-10-12 13:11     ` Heiko Voigt
  1 sibling, 0 replies; 19+ messages in thread
From: Heiko Voigt @ 2016-10-12 13:11 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

On Fri, Oct 07, 2016 at 10:59:29AM -0700, Stefan Beller wrote:
> On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> > +static void free_submodules_sha1s(struct string_list *submodules)
> > +{
> > +       int i;
> > +       for (i = 0; i < submodules->nr; i++) {
> > +               struct string_list_item *item = &submodules->items[i];
> 
> You do not seem to make use of `i` explicitely, so
> for_each_string_list_item might be more readable here?

Will change.

> > @@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
> >                 die("revision walk setup failed");
> >
> >         while ((commit = get_revision(&rev)) != NULL)
> > -               find_unpushed_submodule_commits(commit, needs_pushing);
> > +               find_unpushed_submodule_commits(commit, &submodules);
> >
> >         reset_revision_walk();
> >         free(sha1_copy);
> >         strbuf_release(&remotes_arg);
> >
> > +       for (i = 0; i < submodules.nr; i++) {
> > +               struct string_list_item *item = &submodules.items[i];
> 
> You do not seem to make use of `i` explicitely, so
> for_each_string_list_item might be more readable here?

As above.

Cheers Heiko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
  2016-10-10 22:56   ` Junio C Hamano
@ 2016-10-12 13:33     ` Heiko Voigt
  2016-10-12 17:37       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Heiko Voigt @ 2016-10-12 13:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Stefan Beller, git, Jens.Lehmann, Fredrik Gustafsson,
	Leandro Lucarella

On Mon, Oct 10, 2016 at 03:56:13PM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
> 
> > -static int submodule_needs_pushing(const char *path, const unsigned char sha1[20])
> > +static int check_has_hash(const unsigned char sha1[20], void *data)
> >  {
> > -	if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> > +	int *has_hash = (int *) data;
> > +
> > +	if (!lookup_commit_reference(sha1))
> > +		*has_hash = 0;
> > +
> > +	return 0;
> > +}
> > +
> > +static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
> > +{
> > +	int has_hash = 1;
> > +
> > +	if (add_submodule_odb(path))
> > +		return 0;
> > +
> > +	sha1_array_for_each_unique(hashes, check_has_hash, &has_hash);
> > +	return has_hash;
> > +}
> > +
> > +static int submodule_needs_pushing(const char *path, struct sha1_array *hashes)
> > +{
> > +	if (!submodule_has_hashes(path, hashes))
> >  		return 0;
> 
> Same comment about naming.  
> 
> What do check-has-hash and submodule-has-hashes exactly mean by
> "hash" in their names?  Because I think what is checked here is
> "does the local submodule repository have _all_ the commits
> referenced from the superproject commit we are pushing?", so I'd
> prefer to see "commit" in their names.
> 
> If we do not even have these commits locally, then there is no point
> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> situation) is correct but it is a but subtle.  It's not "we know
> they already have them", but it is "even if we tried to push, it
> won't do us or the other side any good."  A single-liner in-code
> comment may help.

First the naming part. How about:

	submodule_has_commits()

?

Second as mentioned a previous answer[1] to this part: I would actually
like to have a die() here instead of blindly proceeding. Since the user
either specified --recurse-submodules=... at the commandline or it was
implicitly enabled because we have submodules in the tree we should be
careful and not push revisions referencing submodules that are not
available at a remote. If we can not properly figure it out I would
suggest to stop and tell the user how to solve the situation. E.g.
either she clones the appropriate submodules or specifies
--no-recurse-submodules on the commandline to tell git that she does not
care.

Returning 0 here means: "No push needed" but the correct answer would
be: "We do not know". Question is what we should do here which I am
planning to address in a separate patch series since that will be
changing behavior.

So how about:


	if (!submodule_has_hashes(path, hashes))
		/* NEEDSWORK: The correct answer here is "We do not
		 * know" instead of "No". We currently proceed pushing
		 * here as if the submodules commits are available on a
		 * remote, which is not always correct. */
		return 0;

What do you think?

Cheers Heiko

[1] http://public-inbox.org/git/20160919195812.GC62429@book.hvoigt.net/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] serialize collection of changed submodules
  2016-10-12 13:00       ` Heiko Voigt
@ 2016-10-12 17:18         ` Junio C Hamano
  2016-10-13 15:27           ` Heiko Voigt
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-10-12 17:18 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Stefan Beller, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

Heiko Voigt <hvoigt@hvoigt.net> writes:

> Which seems quite extensively long for a static function so how about
> we shorten it a bit and add a comment:
>
> 	/* lookup or create commit object list for submodule */
> 	get_commit_objects_for_submodule_path(...

Or you can even lose "get_" and "path", I guess.  You are not even
"getting" commits but the array that holds them, so the caller can
use it to "get" one of them or it can even use it to "put" a new
one, no?  "get-commit-objects" is a misnomer in that sense.  Either
one of

    get_submodule_commits_array()
    submodule_commits()

perhaps?  I dunno.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
  2016-10-12 13:33     ` Heiko Voigt
@ 2016-10-12 17:37       ` Junio C Hamano
  2016-10-13 15:59         ` Heiko Voigt
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-10-12 17:37 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Jeff King, Stefan Beller, git, Jens.Lehmann, Fredrik Gustafsson,
	Leandro Lucarella

Heiko Voigt <hvoigt@hvoigt.net> writes:

>> If we do not even have these commits locally, then there is no point
>> attempting to push, so returning 0 (i.e. it is not "needs pushing"
>> situation) is correct but it is a but subtle.  It's not "we know
>> they already have them", but it is "even if we tried to push, it
>> won't do us or the other side any good."  A single-liner in-code
>> comment may help.
>
> First the naming part. How about:
>
> 	submodule_has_commits()

Nice.

> Returning 0 here means: "No push needed" but the correct answer would
> be: "We do not know". 

Is it?  Perhaps I am misreading the "submodule-has-commits"; I
thought it was "the remote may or may not need updating, but we
ourselves don't have what they may need to have commits in their
submodule that are referenced by their superproject, so it would not
help them even if we pushed our submodule to them".  It indeed is
different from "No push needed" (rather, "our pushing would be
pointless").

> So how about:
>
>
> 	if (!submodule_has_hashes(path, hashes))
> 		/* NEEDSWORK: The correct answer here is "We do not
> 		 * know" instead of "No". We currently proceed pushing
> 		 * here as if the submodules commits are available on a
> 		 * remote, which is not always correct. */
> 		return 0;

I am not sure.  

What should happen in this scenario?

 * We have two remotes, A and B, for our superproject.

 * We are not interested in one submodule at path X.  Our repository
   is primarily used to work on the superproject and possibly other
   submodules but not the one at path X.

 * We pulled from A to update ourselves.  They were actively working
   on the submodule we are not interested in, and path X in the
   superproject records a new commit that we do not have.

 * We are now trying to push to B.

Should different things happen in these two subcases?

 - We are not interested in submodule at path X, so we haven't even
   done "submodule init" on it.

 - We are not interested in submodule at path X, so even though we
   do have a rather stale clone of it, we do not usually bother
   updating what is checked out at path X and commit our changes
   outside that area.

I tend to think that in these two cases the same thing should
happen.  I am not sure if that same thing should be rejection
(i.e. "you do not know for sure that the commit at path X of the
superproject you are pushing exists in the submodule repository at
the receiving end, so I'd refuse to push the superproject"), as it
makes the only remedy for the situation is for you to make a full
clone of the submodule you are not interested in and you have never
touched yourself in either of these two subcases.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] serialize collection of changed submodules
  2016-10-12 17:18         ` Junio C Hamano
@ 2016-10-13 15:27           ` Heiko Voigt
  0 siblings, 0 replies; 19+ messages in thread
From: Heiko Voigt @ 2016-10-13 15:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

On Wed, Oct 12, 2016 at 10:18:28AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
> 
> > Which seems quite extensively long for a static function so how about
> > we shorten it a bit and add a comment:
> >
> > 	/* lookup or create commit object list for submodule */
> > 	get_commit_objects_for_submodule_path(...
> 
> Or you can even lose "get_" and "path", I guess.  You are not even
> "getting" commits but the array that holds them, so the caller can
> use it to "get" one of them or it can even use it to "put" a new
> one, no?  "get-commit-objects" is a misnomer in that sense.  Either
> one of
> 
>     get_submodule_commits_array()
>     submodule_commits()
> 
> perhaps?  I dunno.

I like the last one. Will use 'submodule_commits()'.

Cheers Heiko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
  2016-10-12 17:37       ` Junio C Hamano
@ 2016-10-13 15:59         ` Heiko Voigt
  0 siblings, 0 replies; 19+ messages in thread
From: Heiko Voigt @ 2016-10-13 15:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Stefan Beller, git, Jens.Lehmann, Fredrik Gustafsson,
	Leandro Lucarella

On Wed, Oct 12, 2016 at 10:37:33AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
> 
> >> If we do not even have these commits locally, then there is no point
> >> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> >> situation) is correct but it is a but subtle.  It's not "we know
> >> they already have them", but it is "even if we tried to push, it
> >> won't do us or the other side any good."  A single-liner in-code
> >> comment may help.
> >
> > First the naming part. How about:
> >
> > 	submodule_has_commits()
> 
> Nice.

Ok will use that. And while I am at it: I will also rename all the
'hashes' variables to commits because that makes the code way clearer I
think.

> > Returning 0 here means: "No push needed" but the correct answer would
> > be: "We do not know". 
> 
> Is it?  Perhaps I am misreading the "submodule-has-commits"; I
> thought it was "the remote may or may not need updating, but we
> ourselves don't have what they may need to have commits in their
> submodule that are referenced by their superproject, so it would not
> help them even if we pushed our submodule to them".  It indeed is
> different from "No push needed" (rather, "our pushing would be
> pointless").

Yes you could also rephrase/see it that way. But the question is: If we
do not have what the remote needs would the user expect us to tell him
that fact and stop or does he usually not care?

> > So how about:
> >
> >
> > 	if (!submodule_has_hashes(path, hashes))
> > 		/* NEEDSWORK: The correct answer here is "We do not
> > 		 * know" instead of "No". We currently proceed pushing
> > 		 * here as if the submodules commits are available on a
> > 		 * remote, which is not always correct. */
> > 		return 0;
> 
> I am not sure.  
> 
> What should happen in this scenario?
> 
>  * We have two remotes, A and B, for our superproject.
> 
>  * We are not interested in one submodule at path X.  Our repository
>    is primarily used to work on the superproject and possibly other
>    submodules but not the one at path X.
> 
>  * We pulled from A to update ourselves.  They were actively working
>    on the submodule we are not interested in, and path X in the
>    superproject records a new commit that we do not have.
> 
>  * We are now trying to push to B.

I am not sure if this is a typical scenario? Well, if you are updating
your main branch from someone else and then push it to your own fork
maybe. You could specify --no-recurse-submodules for this case though.
The proper solution for this case would probably be something along the
lines of 'submodule.<name>.fetchRecurseSubmodules' but for push so we
can mark certain submodules as uninteresting by default.

I like to be more protective to the user here. Its usually more
annoying for possibly many others when you push out things that have
missing things compared to one person not being able to push because his
submodule is not up-to-date/initialized.

> Should different things happen in these two subcases?
> 
>  - We are not interested in submodule at path X, so we haven't even
>    done "submodule init" on it.
> 
>  - We are not interested in submodule at path X, so even though we
>    do have a rather stale clone of it, we do not usually bother
>    updating what is checked out at path X and commit our changes
>    outside that area.
> 
> I tend to think that in these two cases the same thing should
> happen.  I am not sure if that same thing should be rejection
> (i.e. "you do not know for sure that the commit at path X of the
> superproject you are pushing exists in the submodule repository at
> the receiving end, so I'd refuse to push the superproject"), as it
> makes the only remedy for the situation is for you to make a full
> clone of the submodule you are not interested in and you have never
> touched yourself in either of these two subcases.

I also think in both situations the same thing should happen. A decision
that something different should happen should be made explicitly instead
of implicitly just because some submodule is not initialized. That might
be by accident or because a certain submodule is new so here the choice
should be made deliberately by the user, IMO.

Cheers Heiko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] serialize collection of refs that contain submodule changes
  2016-10-12 13:10     ` Heiko Voigt
@ 2016-10-20 23:00       ` Stefan Beller
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2016-10-20 23:00 UTC (permalink / raw)
  To: Heiko Voigt
  Cc: Junio C Hamano, Jeff King, git@vger.kernel.org, Jens Lehmann,
	Fredrik Gustafsson, Leandro Lucarella

> Thanks. So I do not completely get what you are suggesting: args or kept
> it the way it is? Since in the end you are saying it is ok here ;) I
> mainly chose this name because I am substituting the argv variable which
> is already called 'argv' with this array. That might also be the reason
> why in so many locations with struct child_processe's we have the 'argv'
> name: Because they initially started with the old-style NULL terminated
> array.
>
> I am fine with it either way. Just tell me what you like :)

I think it's fine as is here; I was just confused when first seeing this code.

>
> Cheers Heiko

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-10-20 23:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-07 15:06 [PATCH v2 0/3] Speedup finding of unpushed submodules Heiko Voigt
2016-10-07 15:06 ` [PATCH v2 1/3] serialize collection of changed submodules Heiko Voigt
2016-10-07 17:59   ` Stefan Beller
2016-10-10 22:43     ` Junio C Hamano
2016-10-12 13:00       ` Heiko Voigt
2016-10-12 17:18         ` Junio C Hamano
2016-10-13 15:27           ` Heiko Voigt
2016-10-12 13:11     ` Heiko Voigt
2016-10-07 15:06 ` [PATCH v2 2/3] serialize collection of refs that contain submodule changes Heiko Voigt
2016-10-07 18:16   ` Stefan Beller
2016-10-12 13:10     ` Heiko Voigt
2016-10-20 23:00       ` Stefan Beller
2016-10-10 22:48   ` Junio C Hamano
2016-10-07 15:06 ` [PATCH v2 3/3] batch check whether submodule needs pushing into one call Heiko Voigt
2016-10-07 18:30   ` Stefan Beller
2016-10-10 22:56   ` Junio C Hamano
2016-10-12 13:33     ` Heiko Voigt
2016-10-12 17:37       ` Junio C Hamano
2016-10-13 15:59         ` Heiko Voigt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).