git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Daniel Ferreira <bnmvco@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH 2/3] add--interactive: add builtin helper for interactive add
Date: Sat, 6 May 2017 00:30:12 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1.1705052328380.146734@virtualbox> (raw)
In-Reply-To: <1494009820-2090-3-git-send-email-bnmvco@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 13538 bytes --]

Hi Daniel,


On Fri, 5 May 2017, Daniel Ferreira wrote:

> Create a builtin helper for git-add--interactive, which right now is
> only able to reproduce git-add--interactive.perl's status_cmd()
> function, providing a summarized diff numstat to the user.
> 
> This is the first step in an effort to convert git-add--interactive.perl
> to a C builtin, in search for better portability, expressibility and
> performance (specially on non-POSIX systems like Windows).
> 
> Additionally, an eventual complete port of git-add--interactive would
> remove the last "big" Git script to have Perl as a dependency, allowing
> most Git users to have a NOPERL build running without big losses.
> 
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>

Good. I would think that git-add--helper is a better name. It is an
internal command only, anyway, and hopefully it will go away soon (i.e.
hopefully all of add -i will be builtin soon and can then be even moved
into a separate file outside builtin/, say, add-interactive.c).

> diff --git a/.gitignore b/.gitignore
> index 833ef3b..0d6cfe4 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -11,6 +11,7 @@
>  /git
>  /git-add
>  /git-add--interactive
> +/git-add-interactive--helper

Personally, I would prefer a separate commit adding the helper, before it
learns about any functionality. Somehow I have an easier time reviewing
patches if they tease such things apart.

> diff --git a/builtin/add-interactive--helper.c b/builtin/add-interactive--helper.c
> new file mode 100644
> index 0000000..97ca1b3
> --- /dev/null
> +++ b/builtin/add-interactive--helper.c
> @@ -0,0 +1,258 @@
> +#include "builtin.h"
> +#include "color.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +#include "revision.h"
> +
> +#define ADD_INTERACTIVE_HEADER_INDENT "      "

This definition is local to this helper, and we already know that this is
all about add -i. So maybe not repeat it? HEADER_INDENT would be catchier
and just as expressive.

> +enum add_interactive_collection_mode {
> +	COLLECTION_MODE_NONE,
> +	COLLECTION_MODE_WORKTREE,
> +	COLLECTION_MODE_INDEX
> +};

Likewise, maybe we do not want to have this COLLECTION_MODE_ prefix
littering the source code... NONE, WORKTREE and INDEX should be short and
sweet.

> +struct add_interactive_file_status {

Just struct file_status, maybe?

From a cursory glance, it also would appear that the only need for this
struct to be named is the memset() call when initializing a new item,
right? If that is the case, it would be better to use sizeof(*s->files)
and keep it unnamed, as part of the enclosing struct.

> +	int selected;
> +
> +	char path[PATH_MAX];

That is a bit wasteful. `char *name` is what diffstat_t uses...

> +	int lines_added_index;
> +	int lines_deleted_index;
> +	int lines_added_worktree;
> +	int lines_deleted_worktree;

Maybe for better readability:

	struct {
		uintmax_t added, deleted;
	} index, worktree;

> +};
>
> +struct add_interactive_status {
> +	enum add_interactive_collection_mode current_mode;

I am scared of future patches where you cannot wrap the code to the
required <= 80 columns/row because the data type or variable name is too
long... ;-)

> +
> +	const char *reference;
> +	struct pathspec pathspec;
> +
> +	int file_count;
> +	struct add_interactive_file_status *files;

The convention for growable arrays in Git is to have nr, alloc and the
array, and use ALLOC_GROW() to avoid reallocating for every single new
item.

> +};
> +
> +static int add_interactive_use_color = -1;
> +enum color_add_interactive {
> +	ADD_INTERACTIVE_PROMPT,
> +	ADD_INTERACTIVE_HEADER,
> +	ADD_INTERACTIVE_HELP,
> +	ADD_INTERACTIVE_ERROR
> +};

Again, if we were talking about a declaration in a .h file, that prefix
would make sense, but here we are safely in file-local territory where I
think this is just unnecessary churn.

> +static char add_interactive_colors[][COLOR_MAXLEN] = {
> +	GIT_COLOR_BOLD_BLUE, /* Prompt */
> +	GIT_COLOR_BOLD,      /* Header */
> +	GIT_COLOR_BOLD_RED,  /* Help */
> +	GIT_COLOR_BOLD_RED   /* Error */
> +};
> +
> +static const char *add_interactive_get_color(enum color_add_interactive ix)
> +{
> +	if (want_color(add_interactive_use_color))
> +		return add_interactive_colors[ix];
> +	return "";
> +}
> +
> +static int parse_add_interactive_color_slot(const char *slot)
> +{
> +	if (!strcasecmp(slot, "prompt"))
> +		return ADD_INTERACTIVE_PROMPT;
> +	if (!strcasecmp(slot, "header"))
> +		return ADD_INTERACTIVE_HEADER;
> +	if (!strcasecmp(slot, "help"))
> +		return ADD_INTERACTIVE_HELP;
> +	if (!strcasecmp(slot, "error"))
> +		return ADD_INTERACTIVE_ERROR;
> +
> +	return -1;
> +}
> +
> +static int git_add_interactive_config(const char *var,

Not git_add_interactive__helper_config()? ;-)

> +		const char *value, void *cbdata)
> +{
> +	const char *name;
> +
> +	if (!strcmp(var, "color.interactive")) {
> +		add_interactive_use_color = git_config_colorbool(var, value);
> +		return 0;
> +	}
> +
> +	if (skip_prefix(var, "color.interactive", &name)) {
> +		int slot = parse_add_interactive_color_slot(name);
> +		if (slot < 0)
> +			return 0;
> +		if (!value)
> +			return config_error_nonbool(var);
> +		return color_parse(value, add_interactive_colors[slot]);
> +	}
> +
> +	return git_default_config(var, value, cbdata);
> +}
> +
> +static void add_interactive_status_collect_changed_cb(struct diff_queue_struct *q,

I did have that fear, and now it comes true... ;-)

Seriously again, this is a file-local function. We know it is about
add --interactive, and we can be pretty succinct about its role, i.e.
something like collect_changes_callback() would be plenty sufficient.

> +					 struct diff_options *options,
> +					 void *data)
> +{
> +	struct add_interactive_status *s = data;
> +	struct diffstat_t stat;
> +	int i, j;
> +
> +	if (!q->nr)
> +		return;
> +
> +	memset(&stat, 0, sizeof(struct diffstat_t));

I guess it makes sense to play it safe here (by zeroing the entire struct
rather than assigning the individual initial values).

But would

	struct diffstat_t stat = { 0 };

not have done the same?

> +	for (i = 0; i < q->nr; i++) {
> +		struct diff_filepair *p;
> +		p = q->queue[i];
> +		diff_flush_stat(p, options, &stat);
> +	}
> +
> +	for (i = 0; i < stat.nr; i++) {
> +		int file_index = s->file_count;
> +		for (j = 0; j < s->file_count; j++) {
> +			if (!strcmp(s->files[j].path, stat.files[i]->name)) {
> +				file_index = j;
> +				break;
> +			}
> +		}

So basically, this is looking up in a list whether we saw the file in
question already, and the reason we have to do that is that we run the
entire shebang twice, once with the worktree and once with the index.

I wonder whether it would not make sense to switch away s->files from a
list to a hashmap.

> +		if (file_index == s->file_count) {
> +			s->file_count++;
> +			s->files = realloc(s->files,
> +					(q->nr + s->file_count) * sizeof(*s->files));

We already have ALLOC_GROW() to do this, safer. And easier to verify ;-)

> +			memset(&s->files[file_index], 0,
> +					sizeof(struct add_interactive_file_status));
> +		}
> +
> +		memcpy(s->files[file_index].path, stat.files[i]->name,
> +				strlen(stat.files[i]->name) + 1);

Apart from using PATH_MAX bytes for most likely only short names: should
this not be inside the if (file_index == s->file_count) clause? I mean, if
we do not hit that conditional code, it is because we already compared the
file name and figured out that s->files[file_index].path is identical to
stat.files[i]->name...


> +		if (s->current_mode == COLLECTION_MODE_WORKTREE) {
> +			s->files[file_index].lines_added_worktree = stat.files[i]->added;
> +			s->files[file_index].lines_deleted_worktree = stat.files[i]->deleted;
> +		} else if (s->current_mode == COLLECTION_MODE_INDEX) {
> +			s->files[file_index].lines_added_index = stat.files[i]->added;
> +			s->files[file_index].lines_deleted_index = stat.files[i]->deleted;
> +		}
> +	}
> +}
> +
> +static void add_interactive_status_collect_changes_worktree(struct add_interactive_status *s)

I would use the name status_worktree() here instead.

> +{
> +	struct rev_info rev;
> +
> +	s->current_mode = COLLECTION_MODE_WORKTREE;

Now that I read this and remember that only WORKTREE and INDEX are handled
in the callback function: is there actually a use for the NONE enum value?
I.e. is current_mode read out in any other context than the callback
function? If there is no other read, then the NONE enum value is just
confusing.

BTW in the first pass, we pretty much know that we only get unique names,
so the entire lookup is unnecessary and will just increase the time
complexity from O(n) to O(n^2). So let's avoid that.

By moving to a hashmap, you can even get the second phase down to an
expected O(n).

> +
> +	init_revisions(&rev, NULL);
> +	setup_revisions(0, NULL, &rev, NULL);
> +
> +	rev.max_count = 0;

Where does this come from? We are not logging commits... oh wait, I see
that diff-lib.c reuses that field for its diff_unmerged_stage value. Urgh.
Not your fault, of course.

> +	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> +	rev.diffopt.format_callback = add_interactive_status_collect_changed_cb;
> +	rev.diffopt.format_callback_data = s;
> +
> +	run_diff_files(&rev, 0);
> +}
> +
> +static void add_interactive_status_collect_changes_index(struct add_interactive_status *s)
> +{
> +	struct rev_info rev;
> +	struct setup_revision_opt opt;
> +
> +	s->current_mode = COLLECTION_MODE_INDEX;
> +
> +	init_revisions(&rev, NULL);
> +	memset(&opt, 0, sizeof(opt));

	struct setup_revision_opt opt = { NULL };

> +	opt.def = s->reference;
> +	setup_revisions(0, NULL, &rev, &opt);

Oh, I see, you want to use the opt argument to pass HEAD or the emtpy tree
SHA-1 if we're on an unborn branch.

Since you're already calling get_sha1() on "HEAD", you may just as well
keep the SHA-1 (and use the EMPTY_TREE_SHA1 if there is no HEAD), and then
pass that in via the argc, argv parameters to setup_revisions() (imitating
cmd_diff_index() a bit closer).

> +
> +	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> +	rev.diffopt.format_callback = add_interactive_status_collect_changed_cb;
> +	rev.diffopt.format_callback_data = s;
> +
> +	run_diff_index(&rev, 1);
> +}
> +
> +static void list_modified_into_status(struct add_interactive_status *s)
> +{
> +	add_interactive_status_collect_changes_worktree(s);
> +	add_interactive_status_collect_changes_index(s);
> +}

Why not collapse all three functions into one? It is not like they are
totally unrelated nor super-long.

> +static void print_modified(void)
> +{
> +	int i;
> +	struct add_interactive_status s;
> +	const char *modified_fmt = _("%12s %12s %s");

We cannot really translate that...

> +	const char *header_color = add_interactive_get_color(ADD_INTERACTIVE_HEADER);
> +	unsigned char sha1[20];
> +
> +	if (read_cache() < 0)
> +		return;
> +
> +	s.current_mode = COLLECTION_MODE_NONE;
> +	s.reference = !get_sha1("HEAD", sha1) ? "HEAD": EMPTY_TREE_SHA1_HEX;
> +	s.file_count = 0;
> +	s.files = NULL;
> +	list_modified_into_status(&s);
> +
> +	printf(ADD_INTERACTIVE_HEADER_INDENT);
> +	color_fprintf(stdout, header_color, modified_fmt, _("staged"),
> +			_("unstaged"), _("path"));

I think these _() need to become N_().

> +	printf("\n");

Wouldn't it be better to avoid defining modified_fmt (we do not really
modify it, do we?) and at least make the '\n' part of the format string?

> +	for (i = 0; i < s.file_count; i++) {
> +		struct add_interactive_file_status f = s.files[i];
> +		char selection = f.selected ? '*' : ' ';

I would prefer the variable to be called "marker".

> +
> +		char worktree_changes[50];
> +		char index_changes[50];

Those "50" look a bit arbitrary... maybe use strbufs instead (so that we
do not have to hope that all translations of "nothing" or "unchanged",
even Welsh ones, will fit inside those buffers)?

> +		if (f.lines_added_worktree != 0 || f.lines_deleted_worktree != 0)

In Git's source code, the convention is to just drop the `!= 0` in
comparisons, unless there is a really good reason not to.

> +			snprintf(worktree_changes, 50, "+%d/-%d", f.lines_added_worktree,
> +					f.lines_deleted_worktree);
> +		else
> +			snprintf(worktree_changes, 50, "%s", _("nothing"));
> +
> +		if (f.lines_added_index != 0 || f.lines_deleted_index != 0)
> +			snprintf(index_changes, 50, "+%d/-%d", f.lines_added_index,
> +					f.lines_deleted_index);
> +		else
> +			snprintf(index_changes, 50, "%s", _("unchanged"));
> +
> +		printf("%c%2d: ", selection, i + 1);
> +		printf(modified_fmt, index_changes, worktree_changes, f.path);
> +		printf("\n");

It would be nicer to make this a single printf() call. The only idea I
have to allow for that is to turn modified_fmt into a `#define
MODIFIED_FMT "%12s %12s %s"`, though.

> +	}
> +}
> +
> +static void status_cmd(void)
> +{
> +	print_modified();
> +}

As long as this function really only calls another function with no
parameters, let's just drop it. We can call print_modified() instead of
status_cmd() just as easily.

Ævar already commented on the parseopt stuff.

Otherwise this looks pretty good to me!

Thanks,
Johannes

  parent reply	other threads:[~2017-05-05 22:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-05 18:43 [PATCH 0/3] Port git-add--interactive.perl:status_cmd to C Daniel Ferreira
2017-05-05 18:43 ` [PATCH 1/3] diff: export diffstat interface Daniel Ferreira
2017-05-05 21:28   ` Johannes Schindelin
2017-05-05 18:43 ` [PATCH 2/3] add--interactive: add builtin helper for interactive add Daniel Ferreira
2017-05-05 20:16   ` Ævar Arnfjörð Bjarmason
2017-05-05 21:21     ` Johannes Schindelin
2017-05-05 22:09       ` Ævar Arnfjörð Bjarmason
2017-05-05 22:30   ` Johannes Schindelin [this message]
2017-05-05 22:49     ` Ævar Arnfjörð Bjarmason
2017-05-08 11:35       ` Johannes Schindelin
2017-05-05 23:13     ` Daniel Ferreira (theiostream)
2017-05-05 23:28       ` Ævar Arnfjörð Bjarmason
2017-05-08 12:15       ` Johannes Schindelin
2017-05-05 18:43 ` [PATCH 3/3] add--interactive: use add-interactive--helper for status_cmd Daniel Ferreira
2017-05-05 22:32   ` Johannes Schindelin
2017-05-05 19:31 ` [PATCH 0/3] Port git-add--interactive.perl:status_cmd to C Ævar Arnfjörð Bjarmason
2017-05-05 22:33   ` Johannes Schindelin
2017-05-05 22:35   ` Jonathan Nieder
2017-05-05 22:38 ` Johannes Schindelin
2017-05-05 23:37   ` Daniel Ferreira (theiostream)
2017-05-08 12:23     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1.1705052328380.146734@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=bnmvco@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).