git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Heiko Voigt <hvoigt@hvoigt.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Jens Lehmann <Jens.Lehmann@web.de>,
	Brandon Williams <bmwill@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH v4 2/3] implement fetching of moved submodules
Date: Tue, 17 Oct 2017 10:47:28 -0700	[thread overview]
Message-ID: <CAGZ79kZsQoU8wJk+i5aJOxFtsD=EWu_ycEPLM1KhTaOCWD7Y2w@mail.gmail.com> (raw)
In-Reply-To: <20171016135827.GC12756@book.hvoigt.net>

On Mon, Oct 16, 2017 at 6:58 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> We store the changed submodules paths to calculate which submodule needs
> fetching. This does not work for moved submodules since their paths do
> not stay the same in case of a moved submodules. In case of new
> submodules we do not have a path in the current checkout, since they
> just appeared in this fetch.
>
> It is more general to collect the submodule names for changes instead of
> their paths to include the above cases. If we do not have a
> configuration for a gitlink we rely on constructing a default name from
> the path if a git repository can be found at its path. We skip
> non-configured gitlinks whose default name collides with a configured
> one.

Thanks for working on this!

As detailed below, I wonder if it is easier (in maintenance, explaining
correctness, reviewing) if we'd rather keep two lists around. One for
based on names, and if we cannot lookup a name for a submodule, we
rather use the second path based list as a fallback. That would avoid
potential namespace collisions between names and paths, as well as
not having the confusion of mapping back and forth.

Most functions would then need to operate on path, as the name->path
mapping can be looked up for the first list, but the path->name mapping
cannot be looked up for the second list.

Cheers,
Stefan

> With the change described above we implement 'on-demand' fetching of
> changes in moved submodules.
>
> Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>

> ---
>  submodule-config.h          |   3 +
>  submodule.c                 | 138 ++++++++++++++++++++++++++++++++------------
>  t/t5526-fetch-submodules.sh |  35 +++++++++++
>  3 files changed, 138 insertions(+), 38 deletions(-)
>
> diff --git a/submodule-config.h b/submodule-config.h
> index e3845831f6..a5503a5d17 100644
> --- a/submodule-config.h
> +++ b/submodule-config.h
> @@ -22,6 +22,9 @@ struct submodule {
>         int recommend_shallow;
>  };
>
> +#define SUBMODULE_INIT { NULL, NULL, NULL, RECURSE_SUBMODULES_NONE, \
> +       NULL, NULL, SUBMODULE_UPDATE_STRATEGY_INIT, {0}, -1 };
> +
>  struct submodule_cache;
>  struct repository;
>
> diff --git a/submodule.c b/submodule.c
> index 63e7094e16..71d1773e2e 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -21,7 +21,7 @@
>  #include "parse-options.h"
>
>  static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
> -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
> +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
>  static int initialized_fetch_ref_tips;
>  static struct oid_array ref_tips_before_fetch;
>  static struct oid_array ref_tips_after_fetch;
> @@ -674,11 +674,11 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce)
>  }
>
>  static struct oid_array *submodule_commits(struct string_list *submodules,
> -                                          const char *path)
> +                                          const char *name)
>  {
>         struct string_list_item *item;
>
> -       item = string_list_insert(submodules, path);
> +       item = string_list_insert(submodules, name);
>         if (item->util)
>                 return (struct oid_array *) item->util;
>
> @@ -687,39 +687,67 @@ static struct oid_array *submodule_commits(struct string_list *submodules,
>         return (struct oid_array *) item->util;
>  }
>
> +struct collect_changed_submodules_cb_data {
> +       struct string_list *changed;
> +       const struct object_id *commit_oid;
> +};
> +
> +/*
> + * this would normally be two functions: default_name_from_path() and

Please start the comment capitalised. (minor nit)

> + * path_from_default_name(). Since the default name is the same as
> + * the submodule path we can get away with just one function which only
> + * checks whether there is a submodule in the working directory at that
> + * location.

This is an interesting comment, as it hints that we ought to keep it that way.
Earlier I was wondering if we want to make the name distinctively different
than its path, as that will confuse users *less* IMHO. (I just remember
someone asking on the mailing list why their "rename" did not work, as
they just renamed everything in the .gitmodules that looked like the path)

As the path/name is confusing, I'd wish we'd be super concise, such that
errors are harder to sneak into. For example, the arguments name should
be "path" as that is the only thing we can look up using is_sub_pop_gently,
if a "name" is given, than it just works because the chosen default name
was its path.

> +static const char *default_name_or_path(const char *path_or_name)
> +{
> +       int error_code;
> +
> +       if (!is_submodule_populated_gently(path_or_name, &error_code))
> +               return NULL;
> +
> +       return path_or_name;
> +}
> +


> +               if (submodule)
> +                       name = submodule->name;
> +               else {
> +                       name = default_name_or_path(p->two->path);

Here we use the path, as expected. So ideally we'd use
"default_name_for_path".


> +                       /* make sure name does not collide with existing one */
> +                       submodule = submodule_from_name(commit_oid, name);
> +                       if (submodule) {
> +                               warning("Submodule in commit %s at path: "
> +                                       "'%s' collides with a submodule named "
> +                                       "the same. Skipping it.",
> +                                       oid_to_hex(commit_oid), name);
> +                               name = NULL;
> +                       }

This is the ugly part of using one string list and storing names or
path in it. I wonder if we could omit this warning if we had 2 string lists?
One for names (which will then be used for renamed and new submodules)
and the "fall back" path based list. In such a world we would not need
to map back and forth between names and path.

> +               submodule = submodule_from_name(&null_oid, name->string);
> +               if (submodule)
> +                       path = submodule->path;
> +               else
> +                       path = default_name_or_path(name->string);
> +
> +               if (!path)
> +                       continue;


> +               submodule = submodule_from_name(&null_oid, name->string);
> +               if (submodule)
> +                       path = submodule->path;
> +               else
> +                       path = default_name_or_path(name->string);
> +
> +               if (!path)
> +                       continue;


>                 submodule = submodule_from_path(&null_oid, ce->name);
> +               if (!submodule) {
> +                       const char *name = default_name_or_path(ce->name);
> +                       if (name) {
> +                               default_submodule.path = default_submodule.name = name;
> +                               submodule = &default_submodule;
> +                       }
> +               }
>

>
> +test_expect_success "fetch new commits when submodule got renamed" '
> +       git clone . downstream_rename &&
> +       (
> +               cd downstream_rename &&
> +               git submodule update --init &&
> +# NEEDSWORK: we omitted --recursive for the submodule update here since
> +# that does not work. See test 7001 for mv "moving nested submodules"
> +# for details. Once that is fixed we should add the --recursive option
> +# here.
> +               git checkout -b rename &&
> +               git mv submodule submodule_renamed &&
> +               (
> +                       cd submodule_renamed &&
> +                       git checkout -b rename_sub &&
> +                       echo a >a &&
> +                       git add a &&
> +                       git commit -ma &&
> +                       git push origin rename_sub &&
> +                       git rev-parse HEAD >../../expect
> +               ) &&
> +               git add submodule_renamed &&
> +               git commit -m "update renamed submodule" &&
> +               git push origin rename
> +       ) &&
> +       (
> +               cd downstream &&
> +               git fetch --recurse-submodules=on-demand &&
> +               (
> +                       cd submodule &&
> +                       git rev-parse origin/rename_sub >../../actual
> +               )
> +       ) &&
> +       test_cmp expect actual
> +'
> +
>  test_done
> --
> 2.14.1.145.gb3622a4
>

  reply	other threads:[~2017-10-17 17:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-16 13:56 [PATCH v4 0/3] implement fetching of moved submodules Heiko Voigt
2017-10-16 13:57 ` [PATCH v4 1/3] fetch: add test to make sure we stay backwards compatible Heiko Voigt
2017-10-17 17:56   ` Stefan Beller
2017-10-16 13:58 ` [PATCH v4 2/3] implement fetching of moved submodules Heiko Voigt
2017-10-17 17:47   ` Stefan Beller [this message]
2017-10-18  0:03     ` Junio C Hamano
2017-10-18 17:56       ` Stefan Beller
2017-10-19  0:35         ` Junio C Hamano
2017-10-19 18:11           ` [PATCH 1/2] t5526: check for name/path collision in submodule fetch Stefan Beller
2017-10-19 18:11             ` [PATCH 2/2] fetch, push: keep separate lists of submodules and gitlinks Stefan Beller
2017-10-23 14:12               ` Heiko Voigt
2017-10-23 18:05                 ` Stefan Beller
2017-10-24  0:54                 ` Junio C Hamano
2017-10-23 14:16             ` [PATCH 1/2] t5526: check for name/path collision in submodule fetch Heiko Voigt
2017-10-23 17:58               ` Stefan Beller
2017-10-19 23:34           ` [PATCH v4 2/3] implement fetching of moved submodules Stefan Beller
2017-10-16 13:59 ` [PATCH v4 3/3] submodule: simplify decision tree whether to or not to fetch Heiko Voigt
2017-10-17 18:22   ` Stefan Beller
2017-10-18  0:03     ` Junio C Hamano
2017-10-18 18:03   ` Brandon Williams
2017-10-19  0:36     ` Junio C Hamano
2017-10-19 15:38       ` Heiko Voigt
2017-10-19 19:16         ` Brandon Williams
2017-10-17  1:49 ` [PATCH v4 0/3] implement fetching of moved submodules Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kZsQoU8wJk+i5aJOxFtsD=EWu_ycEPLM1KhTaOCWD7Y2w@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).