git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Prathamesh Chavan <pc44800@gmail.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: [GSoC][PATCH v1] Disallow git commands from within unpopulated submodules
Date: Thu, 6 Apr 2017 11:15:43 -0700	[thread overview]
Message-ID: <CAGZ79kYULEaVYFX6_HGJi=CEA1E3Z8i4sKw+8bMaVsqiyWsuGw@mail.gmail.com> (raw)
In-Reply-To: <20170406060053.4453-1-pc44800@gmail.com>

On Wed, Apr 5, 2017 at 11:00 PM, Prathamesh Chavan <pc44800@gmail.com> wrote:
> The main motivations for disallowing git commands within an
> unpopulated submodule are:
>
> Whenever we run "git -C status" within an unpopulated submodule, it
> falls back to the superproject. This occurs since there is no .git
> file in the submodule directory. So superproject's status gets displayed.
> Also, the user's intention is not clear behind running the command
> in an unpopulated submodule. Hence we prefer to error out.
>
> When we run the command "git -C sub add ." within a submodule, the
> results observed are:
>
> In the case of the populated submodule, it acts like running “git add .“
> inside the submodule. This is uncontroversial and runs as expected.
>
> In the case of the unpopulated submodule, the user's intention behind
> entering the above command is unclear. He may have intended to add
> the submodule to the superproject or to add all files inside the
> sub/ directory to the submodule or superproject. Hence we’ll prefer
> to error out in these case.
>
> Eventually, we use a check_prefix_inside_submodule to see check if the
> path is inside an unpopulated submodule. If it is, then we report the
> user about the unpopulated submodule.
>
> Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
> ---
>
> Since this patch effectively uses RUN_SETUP, builtin commands like
> 'diff' and other non-builtin commands are not filtered.
> For such cases, I think, we need to handle them separately.
>
> Also since currently, git-submodule is not a builtin command, the
> command for initializing and updating the submodule doesn't return an
> error message, but once it is converted to builtin, we need to handle
> its case explicitly.
>
> The build report of this patch is available on:
> https://travis-ci.org/pratham-pc/git/builds/219030999
>
> Also, the above patch was initially my GSoC project topic, but I changed
> it later on and added these bug fixes to my wishlist of the proposal.

Thanks for picking this topic up. :)

A couple of weeks back I floated a similar proposal of a patch[1], but
as far as I remember Peff hinted that it is a bad UI to do it on such a
generic early level[2]. And you also mention here that we'd not affect
git-diff or other commands that do not have RUN_SETUP set.

[1] https://public-inbox.org/git/20170119193023.26837-1-sbeller@google.com/
[2] from the same thread as [1]:
  https://public-inbox.org/git/20170120191728.l3ne5tt5pwbmafjh@sigill.intra.peff.net/

And I agree with Peff here that this high level catching is not the best way to
go. Rather we'd have to go through each command, e.g. in git-status I could
imagine it could look like: (white space mangled):

    diff --git a/builtin/commit.c b/builtin/commit.c
    index 4e288bc513..e3c44d4ac4 100644
    --- a/builtin/commit.c
    +++ b/builtin/commit.c
    @@ -1328,6 +1328,23 @@ static int git_status_config(const char *k,
const char *v, void *cb)
         return git_diff_ui_config(k, v, NULL);
     }

    +static void print_warning_inside_submodule(int status_format,
const char *prefix)
    +{
    +    switch (status_format) {
    +    case STATUS_FORMAT_UNSPECIFIED: /* fall through */
    +    case STATUS_FORMAT_NONE:     /* fall through */
    +    case STATUS_FORMAT_LONG:
    +        printf(_("\n\nWARNING: \n\nIn uninitialized submodule
'%s'\n\n\n"), prefix);
    +    case STATUS_FORMAT_SHORT:
    +        printf(_("WARNING: In uninitialized submodule '%s'\n"), prefix);
    +    case STATUS_FORMAT_PORCELAIN:
    +        /* cannot encode the warning in porcelain v1. */
    +        break;
    +    case STATUS_FORMAT_PORCELAIN_V2:
    +        printf("# WARNING prefix in submodule\n");
    +    }
    +}
    +
     int cmd_status(int argc, const char **argv, const char *prefix)
     {
         static struct wt_status s;
    @@ -1380,6 +1397,9 @@ int cmd_status(int argc, const char **argv,
const char *prefix)
         read_cache_preload(&s.pathspec);
         refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED,
&s.pathspec, NULL, NULL);

    +    if (prefix && check_prefix_inside_submodule(prefix))
    +        print_warning_inside_submodule(status_format, prefix);
    +
         fd = hold_locked_index(&index_lock, 0);

         s.is_initial = get_sha1(s.reference, oid.hash) ? 1 : 0;

>
> +#define MODULE_LIST_INIT { NULL, 0, 0 }

I'd keep this initializer macro near the definition, i.e. in the header
(compare to STRBUF_INIT or STRING_LIST_INIT_* for example),
as this can then be used where ever we can use the data structure.

> +void check_prefix_inside_submodule(const char *prefix)

I think we'll need this function in returning way, i.e.

    int check_prefix_inside_submodule(const char *prefix)
    {
        ... do check ...
        return result;
    }

> +{
> +       struct module_list list = MODULE_LIST_INIT;
> +       int i;
> +
> +       if (read_cache() < 0)
> +               die(_("index file corrupt"));
> +
> +       for (i = 0; i < active_nr; i++) {
> +               const struct cache_entry *ce = active_cache[i];
> +
> +               if (!S_ISGITLINK(ce->ce_mode))
> +                               continue;
> +
> +               ALLOC_GROW(list.entries, list.nr + 1, list.alloc);
> +               list.entries[list.nr++] = ce;
> +               while (i + 1 < active_nr &&
> +                       !strcmp(ce->name, active_cache[i + 1]->name))
> +                        /*
> +                         * Skip entries with the same name in different stages
> +                         * to make sure an entry is returned only once.
> +                         */
> +                       i++;
> +       }

The code up to here seems to be partially duplicate of
submodule--helper.c#module_list_compute().

At first I tried coming up with a nice code deduplication (i.e. put
the essential parts as a function somewhere), but I think this doesn't
make sense from an algorithmic point of view, because this runs in O(n)
of active_cache.

active_cache is sorted by (1st) alphabet and (2nd) the index stage.
The second sorting is why we have the while loop with the comment
in the code.

The problem we are trying to solve here, is "Does the active_index contain
prefix?", which can be done in O(log n) with a binary search, because
active_index is sorted by alphabet.

So I am not sure how much code we can reuse here. nevertheless:

> +       for(i = 0; i < list.nr; i++) {

Style nit: We prefer a whitespace between a control statement
(for, if, while) and the opening parens, i.e. "for (i = .."

> +               if(strlen((*list.entries[i]).name) ==  strlen(prefix)) {

(*list.entries[i]).name

can be simplified. The dereference *, combined with an access
of the struct member can be done as ->.
(*foo).bar is equal to foo->bar.

> +               }
> +               else if(strlen((*list.entries[i]).name) ==  strlen(prefix)-1) {

The Git coding style prefers to have the closing brace and the else
on the same line:

        ..
    } else if (..) {
        ..


> +                       const char *out = NULL;
> +                       if(skip_prefix(prefix, (*list.entries[i]).name, &out)) {
> +                               if(strlen(out) == 1 && out[0] == '/')

The strlen operation can take quite long potentially (O(n) with n the
length of out).
So you could put the cheaper operation first, or the following would
check the same
without having to compute the length:

    if (out && out[0] == '/' && !out + 1)
        ..

> +                                       die(_("command from inside unpopulated submodule '%s' not supported."), (*list.entries[i]).name);

Once we have this function inside each command, we can be more precise
the error message. :)

Thanks,
Stefan

  parent reply	other threads:[~2017-04-06 18:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06  6:00 [GSoC][PATCH v1] Disallow git commands from within unpopulated submodules Prathamesh Chavan
2017-04-06  6:11 ` Prathamesh Chavan
2017-04-06 18:15 ` Stefan Beller [this message]
2017-04-06 20:48   ` Prathamesh Chavan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kYULEaVYFX6_HGJi=CEA1E3Z8i4sKw+8bMaVsqiyWsuGw@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=pc44800@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).