git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Prathamesh Chavan <pc44800@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: [GSoC][PATCH v1] Disallow git commands from within unpopulated submodules
Date: Fri, 7 Apr 2017 02:18:45 +0530	[thread overview]
Message-ID: <CAME+mvXbhW9Ya=FZEA8AmWihdn5ROVx6zAbsNYCEUEBY9p-wVQ@mail.gmail.com> (raw)
In-Reply-To: <CAGZ79kYULEaVYFX6_HGJi=CEA1E3Z8i4sKw+8bMaVsqiyWsuGw@mail.gmail.com>

> A couple of weeks back I floated a similar proposal of a patch[1], but
> as far as I remember Peff hinted that it is a bad UI to do it on such a
> generic early level[2]. And you also mention here that we'd not affect
> git-diff or other commands that do not have RUN_SETUP set.
>
> [1] https://public-inbox.org/git/20170119193023.26837-1-sbeller@google.com/
> [2] from the same thread as [1]:
>   https://public-inbox.org/git/20170120191728.l3ne5tt5pwbmafjh@sigill.intra.peff.net/
>
> And I agree with Peff here that this high level catching is not the best way to
> go. Rather we'd have to go through each command, e.g. in git-status I could
> imagine it could look like: (white space mangled):
>
>     diff --git a/builtin/commit.c b/builtin/commit.c
>     index 4e288bc513..e3c44d4ac4 100644
>     --- a/builtin/commit.c
>     +++ b/builtin/commit.c
>     @@ -1328,6 +1328,23 @@ static int git_status_config(const char *k,
> const char *v, void *cb)
>          return git_diff_ui_config(k, v, NULL);
>      }
>
>     +static void print_warning_inside_submodule(int status_format,
> const char *prefix)
>     +{
>     +    switch (status_format) {
>     +    case STATUS_FORMAT_UNSPECIFIED: /* fall through */
>     +    case STATUS_FORMAT_NONE:     /* fall through */
>     +    case STATUS_FORMAT_LONG:
>     +        printf(_("\n\nWARNING: \n\nIn uninitialized submodule
> '%s'\n\n\n"), prefix);
>     +    case STATUS_FORMAT_SHORT:
>     +        printf(_("WARNING: In uninitialized submodule '%s'\n"), prefix);
>     +    case STATUS_FORMAT_PORCELAIN:
>     +        /* cannot encode the warning in porcelain v1. */
>     +        break;
>     +    case STATUS_FORMAT_PORCELAIN_V2:
>     +        printf("# WARNING prefix in submodule\n");
>     +    }
>     +}
>     +
>      int cmd_status(int argc, const char **argv, const char *prefix)
>      {
>          static struct wt_status s;
>     @@ -1380,6 +1397,9 @@ int cmd_status(int argc, const char **argv,
> const char *prefix)
>          read_cache_preload(&s.pathspec);
>          refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED,
> &s.pathspec, NULL, NULL);
>
>     +    if (prefix && check_prefix_inside_submodule(prefix))
>     +        print_warning_inside_submodule(status_format, prefix);
>     +
>          fd = hold_locked_index(&index_lock, 0);
>
>          s.is_initial = get_sha1(s.reference, oid.hash) ? 1 : 0;
>
Yes, even after finishing my patch and finding out that it doesn't work
for various commands, high level catching doesn't seems correct. Also
after you mention in the status command itself, as we don't expect it to
error out in every case, hence making the change in every command
individually will be an appropriate choice. Also in this way, we can also
include the case of git-diff and also for the non-bultin commands.

>>
>> +#define MODULE_LIST_INIT { NULL, 0, 0 }
>
> I'd keep this initializer macro near the definition, i.e. in the header
> (compare to STRBUF_INIT or STRING_LIST_INIT_* for example),
> as this can then be used where ever we can use the data structure.
>
I was a little confused about its location too. But I agree with you
that as it can be used where ever we can later, keeping it in
header will be more appropriate.

>> +void check_prefix_inside_submodule(const char *prefix)
>
> I think we'll need this function in returning way, i.e.
>
>     int check_prefix_inside_submodule(const char *prefix)
>     {
>         ... do check ...
>         return result;
>     }
>
>> +{
>> +       struct module_list list = MODULE_LIST_INIT;
>> +       int i;
>> +
>> +       if (read_cache() < 0)
>> +               die(_("index file corrupt"));
>> +
>> +       for (i = 0; i < active_nr; i++) {
>> +               const struct cache_entry *ce = active_cache[i];
>> +
>> +               if (!S_ISGITLINK(ce->ce_mode))
>> +                               continue;
>> +
>> +               ALLOC_GROW(list.entries, list.nr + 1, list.alloc);
>> +               list.entries[list.nr++] = ce;
>> +               while (i + 1 < active_nr &&
>> +                       !strcmp(ce->name, active_cache[i + 1]->name))
>> +                        /*
>> +                         * Skip entries with the same name in different stages
>> +                         * to make sure an entry is returned only once.
>> +                         */
>> +                       i++;
>> +       }
>
> The code up to here seems to be partially duplicate of
> submodule--helper.c#module_list_compute().
>
> At first I tried coming up with a nice code deduplication (i.e. put
> the essential parts as a function somewhere), but I think this doesn't
> make sense from an algorithmic point of view, because this runs in O(n)
> of active_cache.
>
> active_cache is sorted by (1st) alphabet and (2nd) the index stage.
> The second sorting is why we have the while loop with the comment
> in the code.

Sorry, I was unaware about the active_cache being sorted, hence went
for linear search.

>
> The problem we are trying to solve here, is "Does the active_index contain
> prefix?", which can be done in O(log n) with a binary search, because
> active_index is sorted by alphabet.
>
> So I am not sure how much code we can reuse here. nevertheless:
>
>> +       for(i = 0; i < list.nr; i++) {
>
> Style nit: We prefer a whitespace between a control statement
> (for, if, while) and the opening parens, i.e. "for (i = .."
>
>> +               if(strlen((*list.entries[i]).name) ==  strlen(prefix)) {
>
> (*list.entries[i]).name
>
> can be simplified. The dereference *, combined with an access
> of the struct member can be done as ->.
> (*foo).bar is equal to foo->bar.
>
>> +               }
>> +               else if(strlen((*list.entries[i]).name) ==  strlen(prefix)-1) {
>
> The Git coding style prefers to have the closing brace and the else
> on the same line:
>
>         ..
>     } else if (..) {
>         ..
>

Sorry for being sloppy with the coding style. In all my future patches
I'll make sure I learn from all the mistakes I made so far and also
be more careful.

>
>> +                       const char *out = NULL;
>> +                       if(skip_prefix(prefix, (*list.entries[i]).name, &out)) {
>> +                               if(strlen(out) == 1 && out[0] == '/')
>
> The strlen operation can take quite long potentially (O(n) with n the
> length of out).
> So you could put the cheaper operation first, or the following would
> check the same
> without having to compute the length:
>
>     if (out && out[0] == '/' && !out + 1)
>         ..
>
>> +                                       die(_("command from inside unpopulated submodule '%s' not supported."), (*list.entries[i]).name);
>
> Once we have this function inside each command, we can be more precise
> the error message. :)

Thanks for all the advices on this patch. I'll implement the above changes
in a new patch series.
The approach that I'll take will be as follows:
1. Create function check_prefix_inside_submodule which returns value
   accordingly. Also this function is implemented by checking the active_cache
   list and checking if it prefix is present in active_cache[i]->name,
   using binary search.
   Also is there some other way which is more quicker than searching
   for the prefix in the cache?
2. Individually call this in each command after the git environment is set
   and options are parsed.
3. Apply this change for appropriate options only, as suggested above for
   the status command.
This will ensure more accuracy.
I have also mentioned to work on this BUG in my git proposal as well, but I
kept it in my wishlist section. Hence, I'll continue to work on this as my time
permits. Currently, I'm also working on converting the git-submodule
subcommand 'foreach' from script to builtin, by first converting it
to a function in submodule--helper.c and then later converting it
to builtin.

Thanks,
Prathamesh Chavan

      reply	other threads:[~2017-04-06 20:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06  6:00 [GSoC][PATCH v1] Disallow git commands from within unpopulated submodules Prathamesh Chavan
2017-04-06  6:11 ` Prathamesh Chavan
2017-04-06 18:15 ` Stefan Beller
2017-04-06 20:48   ` Prathamesh Chavan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAME+mvXbhW9Ya=FZEA8AmWihdn5ROVx6zAbsNYCEUEBY9p-wVQ@mail.gmail.com' \
    --to=pc44800@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).