From: Shourya Shukla <shouryashukla.oo@gmail.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
gitster@pobox.com, liu.denton@gmail.com,
kaartic.sivaraam@gmail.com, pc44800@gmail.com
Subject: Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
Date: Sun, 5 Jul 2020 23:04:58 +0530 [thread overview]
Message-ID: <20200705173458.GA5204@konoha> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.2007031712160.50@tvgsbejvaqbjf.bet>
Hello Dscho!
Before replying, one thing I want to clarify is that I ask various
things for double-checking since if I get even the slightest confusion I
start overthinking and destroying it all for myself. Please bear with me
and confirm/clarify stuff wherever possible. Would be of great help! :)
> Hi Shourya,
>
> [exchanging Stefan Beller's dysfunct @google address for their private
> one; I encourage you to do the same in the next iteration, probably
> by editing the `Mentored-by:` line.]
I think you missed to mention it.
> On Fri, 3 Jul 2020, Shourya Shukla wrote:
>
> > From: Prathamesh Chavan <pc44800@gmail.com>
> >
> > The submodule subcommand 'summary' is ported in the process of
> > making git-submodule a builtin. The function cmd_summary() from
> > git-submodule.sh is ported to functions module_summary(),
> > compute_summary_module_list(), prepare_submodule_summary() and
> > generate_submodule_summary(), print_submodule_summary().
> >
> > The first function module_summary() parses the options of submodule
> > subcommand and also acts as the front-end of this subcommand.
> > After parsing them, it calls the compute_summary_module_list()
>
> Missing full-stop, and probably the sentence also wanted to say "function"
> at the end.
I will correct. Thanks for pointing out!
> > The functions compute_summary_module_list() runs the diff_cmd,
> > and generates the modules list, as required by the subcommand.
> > The generation of this module list is done by the using the
>
> s/the using/using/
Will amend!
> > callback function submodule_summary_callback(), and stored in the
> > structure module_cb.
>
> This explains nicely what the patch does. But the commit message should
> not really repeat what can be readily deduced from the patch; It should
> focus on the motivation and on interesting background information that is
> _not_ readily deduced from the patch.
I understand. I will follow your suggestions regarding my patch.
> For example, I see that `$diff_cmd` is called twice in the shell script
> version, once to "get modified modules cared by user" and then _again_,
> with that list of modified modules.
>
> I would have liked to see a reasoning in the commit message that explains
> why this has to be so in the C version. I get why it is complicated in a
> shell script (which lacks proper objects, after all), but I would have
> expected the C version to be able to accumulate the information with a
> single pass.
>
> (Before writing the following paragraph, I actually reviewed the patch
> from bottom to top, in the caller->callee direction.)
>
> Ah. I see that this indeed is the case: there is only one pass in the C
> version. That's a useful piece of metadata for the commit message, I
> think, much more useful than describing the call tree of the functions.
Yup that it worth mentioning.
> Another thing worth mentioning in the commit message is that we use the
> combination of setting a child_process' working directory to the submodule
> path and then calling `prepare_submodule_repo_env()` which also sets
> `GIT_DIR` to `.git`, so that we can be certain that those spawned
> processes will not access the superproject's ODB by mistake.
>
> When reading my suggestions, please keep in mind that I reviewed the
> functions in caller->callee order, i.e. I started at the end of the patch
> and then worked my way up.
>
> All in all, I like the function structure, but I think there is still a
> bit room for improvement in a v2.
> > +static int verify_submodule_object_name(const char *sm_path,
> > + const char *sha1)
> > +{
> > + struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> > + > > + cp_rev_parse.git_cmd = 1;
> > + cp_rev_parse.no_stdout = 1;
> > + cp_rev_parse.dir = sm_path;
>
> So here we specify `sm_path` as current working directory.
>
> > + prepare_submodule_repo_env(&cp_rev_parse.env_array);
>
> And this implicitly sets `GIT_DIR=.git`. Good.
>
> > + argv_array_pushl(&cp_rev_parse.args, "rev-parse", "-q",
> > + "--verify", NULL);
> > + argv_array_pushf(&cp_rev_parse.args, "%s^0", sha1);
>
> After this, we should also append `--` to make sure that we're not parsing
> this as a file name.
Will do!
> Two comments about naming: `sha1` is pretty misleading here, as we do not
> require it to be a SHA-1 (especially in the future in which we switch to
> SHA-256). Besides, what we're really asking for (via that `^0` suffix) is
> a committish. Therefore, I would propose to use `committish` both in the
> parameter name as well as the function name.
I am not aware of this change. I will take this suggestion into account.
> > +
> > + if (run_command(&cp_rev_parse))
> > + return 1;
> > +
> > + return 0;
> > +}
> > +
> > +static void print_submodule_summary(struct summary_cb *info, int errmsg,
> > + int total_commits, int missing_src,
> > + int missing_dst, const char *displaypath,
> > + int is_sm_git_dir, struct module_cb *p)
> > +{
> > + if (p->status == 'T') {
> > + if (S_ISGITLINK(p->mod_dst))
> > + printf(_("* %s %s(blob)->%s(submodule)"),
> > + displaypath, find_unique_abbrev(&p->oid_src, 7),
>
> The shell script version does this:
>
> sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null ||
> echo $sha1_src | cut -c1-7)
>
> That is not quite the same, as it looks for the abbreviation _in the
> submodule_, not in the current project. So I think `find_unique_abbrev()`
> is not correct here.
>
> The funny thing is that we _already_ will have called `git rev-parse
> --verify` for both `p->oid_src` and `p->oid_dst` in the submodule, in the
> caller of this function! And while we throw away the result, and while we
> do not pass `--short`, there is no reason why we shouldn't be able to do
> precisely that.
Okay so you are saying that there is no need of a 'find_unique_abbrev()'
since we would be easily able to obtain these values from the caller of
'print_submodule_summary()' right? Maybe we can pass 'oid_src' or
'oid_dst' as an argument?
> > + find_unique_abbrev(&p->oid_dst, 7));
> > + else
> > + printf(_("* %s %s(submodule)->%s(blob)"),
> > + displaypath, find_unique_abbrev(&p->oid_src, 7),
> > + find_unique_abbrev(&p->oid_dst, 7));
> > + } else {
> > + printf("* %s %s...%s",
> > + displaypath, find_unique_abbrev(&p->oid_src, 7),
> > + find_unique_abbrev(&p->oid_dst, 7));
> > + }
> > +
> > + if (total_commits < 0)
> > + printf(":\n");
> > + else
> > + printf(" (%d):\n", total_commits);
> > +
> > + if (errmsg) {
> > + /*
> > + * Don't give error msg for modification whose dst is not
> > + * submodule, i.e. deleted or changed to blob
> > + */
> > + if (S_ISGITLINK(p->mod_src)) {
> > + if (missing_src && missing_dst) {
> > + printf(_(" Warn: %s doesn't contain commits %s and %s\n"),
> > + displaypath, oid_to_hex(&p->oid_src),
> > + oid_to_hex(&p->oid_dst));
> > + } else if (missing_src) {
> > + printf(_(" Warn: %s doesn't contain commit %s\n"),
> > + displaypath, oid_to_hex(&p->oid_src));
> > + } else {
> > + printf(_(" Warn: %s doesn't contain commit %s\n"),
> > + displaypath, oid_to_hex(&p->oid_dst));
> > + }
> > + }
> > + } else if (is_sm_git_dir) {
> > + struct child_process cp_log = CHILD_PROCESS_INIT;
> > +
> > + cp_log.git_cmd = 1;
> > + cp_log.dir = p->sm_path;
> > + prepare_submodule_repo_env(&cp_log.env_array);
>
> Since the working directory is set to the top-level directory of the
> submodule, and since `prepare_submodule_repo_env()` sets `GIT_DIR` to
> `.git`, I think that the `is_sm_git_dir` condition is unnecessary. In
> fact, the entire `is_sm_git_dir` parameter (and local variable in the
> caller, see more on that below) can go away.
Because we already set the $GIT_DIR to .git/ so an extra check will not
be necessary right?
> > + argv_array_pushl(&cp_log.args, "log", NULL);
> > +
> > + if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst)) {
> > + if (info->summary_limit > 0)
> > + argv_array_pushf(&cp_log.args, "-%d",
> > + info->summary_limit);
> > +
> > + argv_array_pushl(&cp_log.args, "--pretty= %m %s",
> > + "--first-parent", NULL);
> > + argv_array_pushf(&cp_log.args, "%s...%s",
> > + oid_to_hex(&p->oid_src),
> > + oid_to_hex(&p->oid_dst));
> > + } else if (S_ISGITLINK(p->mod_dst)) {
> > + argv_array_pushl(&cp_log.args, "--pretty= > %s",
> > + "-1", oid_to_hex(&p->oid_dst), NULL);
> > + } else {
> > + argv_array_pushl(&cp_log.args, "--pretty= < %s",
> > + "-1", oid_to_hex(&p->oid_src), NULL);
> > + }
> > + run_command(&cp_log);
> > + }
> > + printf("\n");
> > +}
>
> It looks as if there is a whole lot of `oid_to_hex(&p->oid_src)` in that
> function. Together with the realization that we need the abbreviated
> version of that at least in one place, and the other realization that we
> already call `rev-parse --verify` for both `oid_src` and `oid_dst` in the
> caller of this function, it seems to suggest itself that we would actually
> want to pass the `--short` option, too, and to capture the output, and
> pass it down to `print_submodule_summary()` _instead of_ `missing_src` and
> `missing_dst` (e.g. as `src_abbrev` and `dst_abbrev`).
Oh you have mentioned it here too. This seems quite a good approach. I
will adopt this.
> > +
> > +static void generate_submodule_summary(struct summary_cb *info,
> > + struct module_cb *p)
> > +{
> > + int missing_src = 0;
> > + int missing_dst = 0;
> > + char *displaypath;
> > + int errmsg = 0;
> > + int total_commits = -1;
> > + int is_sm_git_dir = 0;
> > + struct strbuf sm_git_dir_sb = STRBUF_INIT;
> > +
> > + if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
> > + if (S_ISGITLINK(p->mod_dst)) {
> > + /*
> > + * NEEDSWORK: avoid using separate process with
> > + * the help of the function head_ref_submodule()
>
> I don't quite understand this comment. There is no `head_ref_submodule()`
> function.
>
> > + */
> > + struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> > + struct strbuf sb_rev_parse = STRBUF_INIT;
> > +
> > + cp_rev_parse.git_cmd = 1;
> > + cp_rev_parse.no_stderr = 1;
> > + cp_rev_parse.dir = p->sm_path;
> > + prepare_submodule_repo_env(&cp_rev_parse.env_array);
> > +
> > + argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> > + "HEAD", NULL);
> > + if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
> > + strbuf_strip_suffix(&sb_rev_parse, "\n");
> > + get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
> > + }
> > + strbuf_release(&sb_rev_parse);
> > + } else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
> > + struct child_process cp_hash_object = CHILD_PROCESS_INIT;
> > + struct strbuf sb_hash_object = STRBUF_INIT;
> > +
> > + cp_hash_object.git_cmd = 1;
> > + argv_array_pushl(&cp_hash_object.args, "hash-object",
> > + p->sm_path, NULL);
> > + if (!capture_command(&cp_hash_object,
> > + &sb_hash_object, 0)) {
> > + strbuf_strip_suffix(&sb_hash_object, "\n");
> > + get_oid_hex(sb_hash_object.buf, &p->oid_dst);
> > + }
> > + strbuf_release(&sb_hash_object);
>
> It would probably be shorter, less error-prone, and quicker to use
> `index_fd()` directly.
>
> BTW I am not quite sure that this code does the correct thing in case of a
> symlink: it hashes the contents of the symlink target (if it is a file,
> otherwise it errors out). But that is hardly an issue introduced by the
> conversion, that's just copied from `git-submodule.sh`.
>
> > + } else {
> > + if (p->mod_dst)
> > + die(_("unexpected mode %d\n"), p->mod_dst);
>
> Hmm. This does not match what the shell script version does:
>
> *)
> # unexpected type
> eval_gettextln "unexpected mode \$mod_dst" >&2
> continue ;;
>
> I think we should also just write the message to `stderr` and continue,
> not `die()`.
>
> In addition to that, I am missing the C code for this case:
>
> 000000)
> ;; # removed
>
> It is quite possible that our test suite does not cover this case (or did
> the test suite fail for you?). If that is indeed the case, it would be
> really good to add a test case as part of this patch series, to gain
> confidence in the correctness of the conversion.
The tests passed for me actually. Whether this is covered by the test
cases, I am not sure. I will have to check it.
> > + }
> > + }
> > +
> > + strbuf_addstr(&sm_git_dir_sb, p->sm_path);
>
> I have to admit that I am not loving the name `sm_git_dir_sb`. Why not
> `submodule_git_dir`? I guess you copied it from elsewhere in
> `submodule--helper.c`...
>
> > + if (is_nonbare_repository_dir(&sm_git_dir_sb))
> > + is_sm_git_dir = 1;
>
> So here, we verify whether there is a repository at `p->sm_path`. I don't
> see that in the shell script version:
>
> missing_src=
> missing_dst=
>
> test $mod_src = 160000 &&
> ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null &&
> missing_src=t
>
> test $mod_dst = 160000 &&
> ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null &&
> missing_dst=t
>
> Let's read a bit further.
>
> > +
> > + if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
> > + missing_src = verify_submodule_object_name(p->sm_path,
> > + oid_to_hex(&p->oid_src));
>
> Ah, and `verify_submodule_object_name()` uses `p->sm_path` as working
> directory. But that's not what the shell script version did: it specified
> the `GIT_DIR` explicitly.
>
> And by using the `prepare_submodule_repo_env()` function in
> `verify_submodule_object_name()`, we specify `GIT_DIR` implicitly, as I
> pointed out in my comment on that function.
Oh so you're saying that it will be better to call
'prepare_submodule_repo_env()' on some variable since we explicitly want to
store the path to GIT_DIR?
It would be of much help if you could explain this part just a little
more (for my own sake).
> So I think that `is_sm_git_dir` might be
I think you missed something here...
> > +
> > + if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
> > + missing_dst = verify_submodule_object_name(p->sm_path,
> > + oid_to_hex(&p->oid_dst));
> > +
> > + displaypath = get_submodule_displaypath(p->sm_path, info->prefix);
> > +
> > + if (!missing_dst && !missing_src) {
> > + if (is_sm_git_dir) {
> > + struct child_process cp_rev_list = CHILD_PROCESS_INIT;
> > + struct strbuf sb_rev_list = STRBUF_INIT;
> > + char *range;
> > +
> > + if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst))
> > + range = xstrfmt("%s...%s", oid_to_hex(&p->oid_src),
> > + oid_to_hex(&p->oid_dst));
> > + else if (S_ISGITLINK(p->mod_src))
> > + range = xstrdup(oid_to_hex(&p->oid_src));
> > + else
> > + range = xstrdup(oid_to_hex(&p->oid_dst));
> > +
> > + cp_rev_list.git_cmd = 1;
> > + cp_rev_list.dir = p->sm_path;
> > + prepare_submodule_repo_env(&cp_rev_list.env_array);
>
> Again, due to setting the working directory to `p->sm_path` and
> (implicitly, via `prepare_submodule_repo_env()`) `GIT_DIR` to `.git`, I do
> not think that we have to guard this block beind `is_sm_git_dir`.
> > +
> > + argv_array_pushl(&cp_rev_list.args, "rev-list",
> > + "--first-parent", range, "--", NULL);
>
> Since `argv_array_push()` duplicates the strings, anyway, we can totally
> avoid the need for the `range` variable:
>
> if (IS_GITLINK(p->mod_src) && IS_GITLINK(p->mod_dst))
> argv_array_pushf(&cp_rev_list.args, "%s...%s",
> oid_to_hex(&p->oid_src),
> oid_to_hex(&p->oid_dst));
> else
> argv_array_push(&cp_rev_list.args, IS_GITLINK(p->mod_src) ?
> oid_to_hex(&p->oid_src) :
> oid_to_hex(&p->oid_dst));
>
> > + if (!capture_command(&cp_rev_list, &sb_rev_list, 0)) {
> > + if (sb_rev_list.len)
> > + total_commits = count_lines(sb_rev_list.buf,
> > + sb_rev_list.len);
>
> That's actually not necessary. `git rev-list --count` will give you a nice
> number, no need to capture a potentially large amount of memory only to
> count the lines.
>
> This may also make the patch obsolete that makes `count_lines()` public.
Therefore we eliminate count_lines() from here and instead do a 'git
rev-list --count'?
> > + else
> > + total_commits = 0;
> > + }
>
> > +
> > + free(range);
> > + strbuf_release(&sb_rev_list);
> > + }
> > + } else {
> > + errmsg = 1;
> > + }
>
> I am missing the equivalent for these lines here:
>
> t,)
> errmsg="$(eval_gettext " Warn: \$display_name doesn't contain commit \$sha1_src")"
> ;;
> ,t)
> errmsg="$(eval_gettext " Warn: \$display_name doesn't contain commit \$sha1_dst")"
> ;;
> t,t)
> errmsg="$(eval_gettext " Warn: \$display_name doesn't contain commits \$sha1_src and \$ ha1_dst")"
> ;;
I will add them.
> I am not quite sure whether it is a good idea to leave it to the
> `print_submodule_summary()` function to generate the `errmsg`. I think I'd
> rather have it a `char *` than an `int`.
Would it be better to add these error messages in
'prepare_submodule_summary()'? If we have error messages as integers
then we will simply
> > +
> > + print_submodule_summary(info, errmsg, total_commits,
> > + missing_src, missing_dst,
> > + displaypath, is_sm_git_dir, p);
> > +
> > + free(displaypath);
> > + strbuf_release(&sm_git_dir_sb);
> > +}
> > +
> > +static void prepare_submodule_summary(struct summary_cb *info,
> > + struct module_cb_list *list)
> > +{
> > + int i;
> > + for (i = 0; i < list->nr; i++) {
> > + struct module_cb *p = list->entries[i];
> > + struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> > +
> > + if (p->status == 'D' || p->status == 'T') {
> > + generate_submodule_summary(info, p);
> > + continue;
> > + }
> > +
> > + if (info->for_status) {
> > + char *config_key;
>
> Since the `config_key` is only used within the `if()` block it would be
> better to declare it within that block.
>
> > + const char *ignore_config = "none";
>
> Since the only value we ever care about is "all", how about turning this
> into an `int`, setting it to `0` here, and later assigning it to
> `!strcmp(value, "all")` and `!strcmp(sub->ignore, "all")`, respectively?
Alright will do!
> I mean, I get it. Unix shell scripts are all about passing around text.
> And it is much easier to just translate that to C faithfully. But that
> does not make it good C code. C has data types, and proper C code makes
> use of that.
>
> > + const char *value;
>
> If you want to save on lines, you can cuddle this together with other
> declarations of the same type. Even so, it could be scoped more narrowly.
>
> > + const struct submodule *sub = submodule_from_path(the_repository,
> > + &null_oid,
> > + p->sm_path);
> > +
> > + if (sub && p->status != 'A') {
>
> Good. The shell script version _always_ retrieved the `.ignore` config
> value, even if the `status` is `A`. Your version is much better.
>
> But why bother calling `submodule_from_path()` if the status is `A`?
What exactly does a status of 'A' or 'T' mean? I mean I know what we are
doing but what exactly do these translate into?
> I could actually see the `const struct submodule *sub;` declaration be
> pulled out of this scope, and combining the `if (info->for_status &&
> p->status != 'A'), and the moving the assignment of `sub` into the `else
> if ((sub = submodule_from_path(r, &null_oid, p->sm_path)) &&
> sub->ignore)`.
>
> That would save us one entire indentation level.
That seems a good approach! I will try this out.
> > + config_key = xstrfmt("submodule.%s.ignore",
> > + sub->name);
> > + if (!git_config_get_string_const(config_key, &value))
> > + ignore_config = value;
> > + else if (sub->ignore)
> > + ignore_config = sub->ignore;
> > +
> > + free(config_key);
> > + if (!strcmp(ignore_config, "all"))
> > + continue;
> > + }
> > + }
> > +
> > + /* Also show added or modified modules which are checked out */
> > + cp_rev_parse.dir = p->sm_path;
> > + cp_rev_parse.git_cmd = 1;
> > + cp_rev_parse.no_stderr = 1;
> > + cp_rev_parse.no_stdout = 1;
> > +
> > + argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> > + "--git-dir", NULL);
> > +
> > + if (!run_command(&cp_rev_parse))
>
> I wonder whether we really need to waste an entire spawned process on
> figuring out whether `p->sm_path` refers to an active repository. Wouldn't
> `is_submodule_active(r, p->sm_path)` fulfill the same purpose?
Yep! This is correct. I will change.
> > + generate_submodule_summary(info, p);
> > + }
> > +}
> > +
> > +static void submodule_summary_callback(struct diff_queue_struct *q,
> > + struct diff_options *options,
> > + void *data)
> > +{
> > + int i;
> > + struct module_cb_list *list = data;
> > + for (i = 0; i < q->nr; i++) {
> > + struct diff_filepair *p = q->queue[i];
> > + struct module_cb *temp;
> > +
> > + if (!S_ISGITLINK(p->one->mode) && !S_ISGITLINK(p->two->mode))
> > + continue;
> > + temp = (struct module_cb*)malloc(sizeof(struct module_cb));
> > + temp->mod_src = p->one->mode;
> > + temp->mod_dst = p->two->mode;
> > + temp->oid_src = p->one->oid;
> > + temp->oid_dst = p->two->oid;
> > + temp->status = p->status;
> > + temp->sm_path = xstrdup(p->one->path);
> > +
> > + ALLOC_GROW(list->entries, list->nr + 1, list->alloc);
> > + list->entries[list->nr++] = temp;
> > + }
> > +}
> > +
> > +static const char *get_diff_cmd(enum diff_cmd diff_cmd)
> > +{
> > + switch (diff_cmd) {
> > + case DIFF_INDEX: return "diff-index";
> > + case DIFF_FILES: return "diff-files";
> > + default: BUG("bad diff_cmd value %d", diff_cmd);
> > + }
> > +}
> > +
> > +static int compute_summary_module_list(char *head,
> > + struct summary_cb *info,
> > + enum diff_cmd diff_cmd)
> > +{
> > + struct argv_array diff_args = ARGV_ARRAY_INIT;
> > + struct rev_info rev;
> > + struct module_cb_list list = MODULE_CB_LIST_INIT;
> > +
> > + argv_array_push(&diff_args, get_diff_cmd(diff_cmd));
> > + if (info->cached)
> > + argv_array_push(&diff_args, "--cached");
> > + argv_array_pushl(&diff_args, "--ignore-submodules=dirty", "--raw",
> > + NULL);
> > + if (head)
> > + argv_array_push(&diff_args, head);
> > + argv_array_push(&diff_args, "--");
> > + if (info->argc)
> > + argv_array_pushv(&diff_args, info->argv);
> > +
> > + git_config(git_diff_basic_config, NULL);
> > + init_revisions(&rev, info->prefix);
> > + rev.abbrev = 0;
> > + precompose_argv(diff_args.argc, diff_args.argv);
> > +
> > + diff_args.argc = setup_revisions(diff_args.argc, diff_args.argv,
> > + &rev, NULL);
> > + rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
> > + rev.diffopt.format_callback = submodule_summary_callback;
> > + rev.diffopt.format_callback_data = &list;
> > +
> > + if (!info->cached) {
> > + if (diff_cmd == DIFF_INDEX)
>
> Please substitute the double-space by a single one.
Will do!
> > + setup_work_tree();
> > + if (read_cache_preload(&rev.diffopt.pathspec) < 0) {
> > + perror("read_cache_preload");
> > + return -1;
> > + }
> > + } else if (read_cache() < 0) {
> > + perror("read_cache");
> > + return -1;
> > + }
> > +
> > + if (diff_cmd == DIFF_INDEX)
> > + run_diff_index(&rev, info->cached);
> > + else
> > + run_diff_files(&rev, 0);
> > + prepare_submodule_summary(info, &list);
> > + return 0;
> > +}
> > +
> > +static int module_summary(int argc, const char **argv, const char *prefix)
> > +{
> > + struct summary_cb info = SUMMARY_CB_INIT;
> > + int cached = 0;
> > + int for_status = 0;
> > + int quiet = 0;
> > + int files = 0;
> > + int summary_limit = -1;
> > + struct child_process cp_rev = CHILD_PROCESS_INIT;
> > + struct strbuf sb = STRBUF_INIT;
> > + enum diff_cmd diff_cmd = DIFF_INDEX;
> > + int ret;
> > +
> > + struct option module_summary_options[] = {
> > + OPT__QUIET(&quiet, N_("Suppress output of summarising submodules")),
> > + OPT_BOOL(0, "cached", &cached,
> > + N_("Use the commit stored in the index instead of the submodule HEAD")),
> > + OPT_BOOL(0, "files", &files,
> > + N_("To compare the commit in the index with that in the submodule HEAD")),
> > + OPT_BOOL(0, "for-status", &for_status,
> > + N_("Skip submodules with 'ignore_config' value set to 'all'")),
> > + OPT_INTEGER('n', "summary-limit", &summary_limit,
> > + N_("Limit the summary size")),
> > + OPT_END()
> > + };
> > +
> > + const char *const git_submodule_helper_usage[] = {
> > + N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
> > + NULL
> > + };
> > +
> > + argc = parse_options(argc, argv, prefix, module_summary_options,
> > + git_submodule_helper_usage, 0);
> > +
> > + if (!summary_limit)
> > + return 0;
> > +
> > + cp_rev.git_cmd = 1;
> > + argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
> > + argc ? argv[0] : "HEAD", NULL);
>
> Oy. Why not simply call `get_oid()`? No need to spawn a new process.
Then everytime we need 'rev-parse', we simply call 'get_oid()'? That
will save us a ton of processes?
But I think we do need to capture the output of 'git rev-parse --verify
....' so I think it will backfire to use 'get_oid()' or am I just being
too dumb and not catching on something?
> > +
> > + if (!capture_command(&cp_rev, &sb, 0)) {
> > + strbuf_strip_suffix(&sb, "\n");
> > + if (argc) {
> > + argv++;
> > + argc--;
> > + }
> > + } else if (!argc || !strcmp(argv[0], "HEAD")) {
> > + /* before the first commit: compare with an empty tree */
> > + struct stat st;
> > + struct object_id oid;
> > + if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
> > + prefix, 3))
> > + die("Unable to add %s to database", oid.hash);
>
> Umm. The original reads:
>
> # before the first commit: compare with an empty tree
> head=$(git hash-object -w -t tree --stdin </dev/null)
>
> It does not actually read from `stdin`. It reads from `/dev/null`,
> redirected to the input. And what it _actually_ does is to generate the
> OID of the empty tree.
>
> But we already _have_ the OID of the empty tree! It's
> `the_hash_algo->empty_tree`.
I did not know this 'the_hash_algo'. I will use it. Thanks! :)
> I hope that this is covered by the test suite. Please check that. The test
> would succeed with your version, but only because tests are run with
> `stdin` redirected from `/dev/null` by default.
I guess yes. My work passed because the tests are written this way.
> > + strbuf_addstr(&sb, oid_to_hex(&oid));
> > + if (argc) {
> > + argv++;
> > + argc--;
> > + }
> > + } else {
> > + strbuf_addstr(&sb, "HEAD");
> > + }
>
> The conversion to C would make for a fine excuse to simplify the logic.
This was kind of like the 'shift' in shell. What equivalent do you
suggest?
> > + if (files) {
> > + if (cached)
> > + die(_("--cached and --files are mutually exclusive"));
> > + diff_cmd = DIFF_FILES;
> > + }
> > +
> > + info.argc = argc;
> > + info.argv = argv;
> > + info.prefix = prefix;
> > + info.cached = !!cached;
> > + info.for_status = !!for_status;
> > + info.quiet = quiet;
> > + info.files = files;
> > + info.summary_limit = summary_limit;
> > +
> > + ret = compute_summary_module_list((diff_cmd == DIFF_FILES) ? NULL : sb.buf,
> > + &info, diff_cmd);
>
> It would be better to pass the OID as `struct object_id *`, not as string.
Will do!
> Other than that, this patch nicely follows previous conversions from Unix
> shell scripts to C.
>
> Well done,
> Johannes
Thank you! It was a highly detailed review! I am still learning tons of
stuff about Git's code and such a review does help a lot! :)
next prev parent reply other threads:[~2020-07-05 17:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-02 19:24 [GSoC][PATCH 0/4] submodule: port 'summary' from Shell to C Shourya Shukla
2020-07-02 19:24 ` [PATCH 1/4] submodule: amend extra line feed between callback struct and macro Shourya Shukla
2020-07-03 14:57 ` Johannes Schindelin
2020-07-03 15:37 ` Philip Oakley
2020-07-04 15:21 ` Shourya Shukla
2020-07-04 15:39 ` Đoàn Trần Công Danh
2020-07-05 10:52 ` Philip Oakley
2020-07-02 19:24 ` [PATCH 2/4] submodule: rename helper functions to avoid ambiguity Shourya Shukla
2020-07-02 19:24 ` [PATCH 3/4] diff: change scope of the function count_lines() Shourya Shukla
2020-07-03 15:07 ` Johannes Schindelin
2020-07-04 15:28 ` Shourya Shukla
2020-07-04 21:46 ` Christian Couder
2020-07-02 19:24 ` [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C Shourya Shukla
2020-07-03 20:46 ` Johannes Schindelin
2020-07-05 17:34 ` Shourya Shukla [this message]
2020-07-06 9:16 ` Kaartic Sivaraam
2020-07-06 11:15 ` Shourya Shukla
2020-07-12 0:46 ` Johannes Schindelin
2020-07-15 14:53 ` Shourya Shukla
2020-07-15 18:41 ` Shourya Shukla
2020-08-10 13:24 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200705173458.GA5204@konoha \
--to=shouryashukla.oo@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kaartic.sivaraam@gmail.com \
--cc=liu.denton@gmail.com \
--cc=pc44800@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).