git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Slavica Djukic via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Phillip Wood" <phillip.wood@dunelm.org.uk>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Slavica Djukic" <slawica92@hotmail.com>
Subject: Re: [PATCH v5 03/10] add-interactive.c: implement list_modified
Date: Thu, 21 Feb 2019 11:06:45 -0800	[thread overview]
Message-ID: <xmqqtvgxt0ze.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <8790ffaa394603279927f9cd4c80f1d06bb5f976.1550662887.git.gitgitgadget@gmail.com> (Slavica Djukic via GitGitGadget's message of "Wed, 20 Feb 2019 03:41:30 -0800 (PST)")

"Slavica Djukic via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/add-interactive.c b/add-interactive.c
> new file mode 100644
> index 0000000000..7d096207d4
> --- /dev/null
> +++ b/add-interactive.c
> @@ -0,0 +1,188 @@
> +#include "cache.h"
> +#include "commit.h"
> +#include "color.h"
> +#include "config.h"
> +#include "diffcore.h"
> +#include "revision.h"
> +
> +enum collection_phase {
> +	WORKTREE,
> +	INDEX
> +};
> +
> +struct file_stat {
> +	struct hashmap_entry ent;
> +	struct {
> +		uintmax_t added, deleted;
> +	} index, worktree;
> +	char name[FLEX_ARRAY];
> +};
> +
> +struct collection_status {
> +	enum collection_phase phase;
> +
> +	const char *reference;
> +	struct pathspec pathspec;
> +
> +	struct hashmap file_map;
> +};
> +
> +static int hash_cmp(const void *unused_cmp_data, const void *entry,
> +		    const void *entry_or_key, const void *keydata)
> +{
> +	const struct file_stat *e1 = entry, *e2 = entry_or_key;
> +	const char *name = keydata ? keydata : e2->name;
> +
> +	return strcmp(e1->name, name);
> +}

This one is used with hashmap API, which gives unfortunate name
"cmp_fn" to its callback, which does not necessarily need to
compare.  It only needs to say "are they different (yes/no)?"
without having to order them (iow, it would not break anything if
the above is changed to return strcmp(name, e1->name)).

Rather than naming the callback after the name of the API that calls
it (i.e. "hash"), it would make it more descriptive if it were named
after what is being compared for equality, in this case, pathname.
Perhaps "pathname_equal()" or something?

> +static int alphabetical_cmp(const void *a, const void *b)
> +{
> +	struct file_stat *f1 = *((struct file_stat **)a);
> +	struct file_stat *f2 = *((struct file_stat **)b);
> +
> +	return strcmp(f1->name, f2->name);
> +}

This one is used with qsort(), and there are two things you may want
the function name to tell the readers.  "what is compared" and "how
it is compared".  Alphabetical is a bit fuzzy (different people sort
"a.txt", "A.txt", and "b.txt" in different order) and it gives a
piece of information of secondary importance (i.e. it only starts to
make sense after you know you are ordering pathnames), so perhaps
"pathname_cmp()" or something may be a better name.

> +
> +static void collect_changes_cb(struct diff_queue_struct *q,
> +			       struct diff_options *options,
> +			       void *data)
> +{
> +	struct collection_status *s = data;
> +	struct diffstat_t stat = { 0 };
> +	int i;
> +
> +	if (!q->nr)
> +		return;
> +
> +	compute_diffstat(options, &stat);

Remember what I said on patch 01/10?  This callback function is
designed to take arbitrary diff_queue_struct q as its parameter and
work on it, but compute_diffstat() is misdesigned and cannot work on
anything but the singleton diff_queued_diff, which is an impedance
mismatch that can easily be corrected by going back to 01/10 and
fixing it.

> +	for (i = 0; i < stat.nr; i++) {
> +		struct file_stat *entry;
> +		const char *name = stat.files[i]->name;
> +		unsigned int hash = strhash(name);
> +
> +		entry = hashmap_get_from_hash(&s->file_map, hash, name);
> +		if (!entry) {
> +			FLEX_ALLOC_STR(entry, name, name);
> +			hashmap_entry_init(entry, hash);
> +			hashmap_add(&s->file_map, entry);
> +		}

The path may already be in the collection_status.file_map from the
previous run when "diff-index --cached" is run, in which case we avoid
adding it twice, which makes sense.

> +		if (s->phase == WORKTREE) {
> +			entry->worktree.added = stat.files[i]->added;
> +			entry->worktree.deleted = stat.files[i]->deleted;
> +		} else if (s->phase == INDEX) {
> +			entry->index.added = stat.files[i]->added;
> +			entry->index.deleted = stat.files[i]->deleted;
> +		}

As the set of phases will not going to grow, not having the final
'else BUG("phase is neither WORKTREE nor INDEX");' here is OK.

But stepping back a bit, if we know we will not grow the phases,
then it may be simpler *and* equally descriptive to rename .phase
field to a boolean ".collecting_from_index" that literally means
"are we collecting from the index?" and that way we can also get rid
of the enum.

> +	}
> +}
> +
> +static void collect_changes_worktree(struct collection_status *s)
> +{
> +	struct rev_info rev;
> +
> +	s->phase = WORKTREE;
> +
> +	init_revisions(&rev, NULL);
> +	setup_revisions(0, NULL, &rev, NULL);
> +
> +	rev.max_count = 0;

The convention to (ab)use the max_count field to specify the
unmerged stage, against which the working tree file is compared for
an unmerged path, is one of the older quirks of the run_diff_files()
API.  "git diff-files" itself when running cmd_diff_files() leaves
rev.max_count untouched to get a normal output (as opposed to the
case when it is told to do --base/--ours/--theirs), so it ends up
passing -1 in this field in such a case.

This explicit assignment to 0 must be explained with in-code comment
why we want to do so.

> +	rev.diffopt.flags.ignore_dirty_submodules = 1;

OK, this matches what "sub list_modified" does.

> +	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> +	rev.diffopt.format_callback = collect_changes_cb;
> +	rev.diffopt.format_callback_data = s;
> +	run_diff_files(&rev, 0);
> +}
> +
> +static void collect_changes_index(struct collection_status *s)
> +{
> +	struct rev_info rev;
> +	struct setup_revision_opt opt = { 0 };
> +
> +	s->phase = INDEX;
> +
> +	init_revisions(&rev, NULL);
> +	opt.def = s->reference;
> +	setup_revisions(0, NULL, &rev, &opt);
> +
> +	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> +	rev.diffopt.format_callback = collect_changes_cb;
> +	rev.diffopt.format_callback_data = s;
> +
> +	run_diff_index(&rev, 1);
> +}
> +
> +static int is_inital_commit(void)
> +{
> +	struct object_id sha1;
> +	if (get_oid("HEAD", &sha1))

Using the newer get_oid() API (as opposed to get_sha1() API) while
calling the variable that receives its result "sha1" is an
oxymoron.  Most users of the API call it "oid", I think.

> +		return 1;
> +	return 0;
> +}

inital???

Grep for "unborn" in the codebase.  It probably makes sense to call
it on_unborn_branch() instead.

	static int on_unborn_branch(void)
	{
		struct object_id oid;
		return !!get_oid("HEAD", &oid);
	}

Eventually, the users of "unborn" in sequencer.c and builtin/reset.c
may want to share the implementation but the helper is so small that
we probably should not worry about it until the topic is done and
leave it for a later clean-up.

> +static const char *get_diff_reference(void)
> +{
> +	if(is_inital_commit())

Style.  Do not forget SP between "if" and "(".

> +		return empty_tree_oid_hex();
> +	return "HEAD";
> +}

Although probably

	return on_unborn_branch() ? empty_tree_oid_hex() : "HEAD";

is clear enough.

> +static void filter_files(const char *filter, struct hashmap *file_map,
> +			 struct file_stat **files)
> +{
> +
> +	for (int i = 0; i < hashmap_get_size(file_map); i++) {

The declaration of "int i" there is one of the C99ism we do not use,
IIRC ("git grep 'for (int ' \*.c" comes up empty).

> +		struct file_stat *f = files[i];
> +
> +		if ((!(f->worktree.added || f->worktree.deleted)) &&
> +		   (!strcmp(filter, "file-only")))
> +				hashmap_remove(file_map, f, NULL);
> +
> +		if ((!(f->index.added || f->index.deleted)) &&
> +		   (!strcmp(filter, "index-only")))
> +				hashmap_remove(file_map, f, NULL);
> +	}

As part of the "interactive" machinery, obvious inefficiency in the
code may not matter that much, but having to make exact string
comparison for an unmodified parameter twice for each iteration does
not look like showing a good taste.

What is the earliest time the code can decide that it is asked to do
a "file-only" filtering?  What I am trying to get at is that if we
know early enough, instead of grabbing outputs from "diff-files" and
"diff-index --cached", only to drop such entries that do not have
output in "diff-files", we could run only "diff-index --cached"
without spending any cycle to run "diff-files" in the first place.

Even if such a reorganizing of the callflow is impossible (I didn't
check), at least a micro optimization would be:

	int i;
	int file_only = !strcmp(filter, "file-only");
	int index_only = !strcmp(filter, "index-only");

	for (i = 0; i < hashmap_get_size(file_map), i++) {
		struct file_stat *f = files[i];
		if (file_only && !(f->worktree.added || f->worktree.deleted))
			hashmap_remove(file_map, f, NULL);
		...
	}



> +}
> +
> +static struct collection_status *list_modified(struct repository *r, const char *filter)
> +{
> +	int i = 0;
> +	struct collection_status *s = xcalloc(1, sizeof(*s));
> +	struct hashmap_iter iter;
> +	struct file_stat **files;
> +	struct file_stat *entry;
> +
> +	if (repo_read_index(r) < 0) {
> +		printf("\n");
> +		return NULL;
> +	}
> +
> +	s->reference = get_diff_reference();
> +	hashmap_init(&s->file_map, hash_cmp, NULL, 0);
> +
> +	collect_changes_worktree(s);
> +	collect_changes_index(s);
> +
> +	if (hashmap_get_size(&s->file_map) < 1) {
> +		printf("\n");
> +		return NULL;
> +	}
> +
> +	hashmap_iter_init(&s->file_map, &iter);
> +
> +	files = xcalloc(hashmap_get_size(&s->file_map), sizeof(struct file_stat *));
> +	while ((entry = hashmap_iter_next(&iter))) {
> +		files[i++] = entry;
> +	}
> +	QSORT(files, hashmap_get_size(&s->file_map), alphabetical_cmp);
> +
> +	if (filter)
> +		filter_files(filter, &s->file_map, files);

Ahh, this answers my previous question, no?  The "filter" parameter
given to this function will never change, so we could inspect it
even before we call collect_chagnes_{worktree,index}().  If the
caller says "I want to see only the paths that appear in the
diff-files output", there is no need to call collect_changes_index()
at all, no?

> +
> +	free(files);
> +	return s;
> +}

  reply	other threads:[~2019-02-21 19:06 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20 12:34 [PATCH 0/7] Turn git add-i into built-in Johannes Schindelin
2018-12-20 12:09 ` [PATCH 1/7] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2018-12-20 12:09 ` [PATCH 2/7] add--helper: create builtin helper for interactive add Daniel Ferreira via GitGitGadget
2018-12-20 12:09 ` [PATCH 3/7] add-interactive.c: implement status command Daniel Ferreira via GitGitGadget
2018-12-20 12:09 ` [PATCH 4/7] add--interactive.perl: use add--helper --status for status_cmd Daniel Ferreira via GitGitGadget
2018-12-20 12:09 ` [PATCH 5/7] add-interactive.c: implement show-help command Slavica Djukic via GitGitGadget
2019-01-14 11:12   ` Phillip Wood
2018-12-20 12:09 ` [PATCH 6/7] Git.pm: introduce environment variable GIT_TEST_PRETEND_TTY Slavica Djukic via GitGitGadget
2019-01-14 11:13   ` Phillip Wood
2019-01-15 12:45     ` Slavica Djukic
2019-01-15 13:50     ` Johannes Schindelin
2019-01-15 16:09       ` Phillip Wood
2018-12-20 12:09 ` [PATCH 7/7] add--interactive.perl: use add--helper --show-help for help_cmd Slavica Djukic via GitGitGadget
2019-01-14 11:17   ` Phillip Wood
2018-12-20 12:37 ` [PATCH 0/7] Turn git add-i into built-in Johannes Schindelin
2019-01-11 14:09 ` Slavica Djukic
2019-01-18  7:47 ` [PATCH v2 " Slavica Đukić via GitGitGadget
2019-01-18  7:47   ` [PATCH v2 1/7] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-01-18  7:47   ` [PATCH v2 2/7] add--helper: create builtin helper for interactive add Daniel Ferreira via GitGitGadget
2019-01-18  7:47   ` [PATCH v2 3/7] add-interactive.c: implement status command Daniel Ferreira via GitGitGadget
2019-01-18  7:47   ` [PATCH v2 4/7] add--interactive.perl: use add--helper --status for status_cmd Daniel Ferreira via GitGitGadget
2019-01-18  7:47   ` [PATCH v2 5/7] add-interactive.c: implement show-help command Slavica Djukic via GitGitGadget
2019-01-18 11:20     ` Phillip Wood
2019-01-18 12:19       ` Slavica Djukic
     [not found]       ` <VI1PR05MB577331CCE110D2EAE325927CA69C0@VI1PR05MB5773.eurprd05.prod.outlook.com>
2019-01-18 14:25         ` Phillip Wood
2019-01-18 20:40           ` Johannes Schindelin
2019-01-18  7:47   ` [PATCH v2 6/7] t3701-add-interactive: test add_i_show_help() Slavica Djukic via GitGitGadget
2019-01-18 11:23     ` Phillip Wood
2019-01-18  7:47   ` [PATCH v2 7/7] add--interactive.perl: use add--helper --show-help for help_cmd Slavica Djukic via GitGitGadget
2019-01-21  9:13   ` [PATCH v3 0/7] Turn git add-i into built-in Slavica Đukić via GitGitGadget
2019-01-21  9:13     ` [PATCH v3 1/7] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-01-21  9:13     ` [PATCH v3 3/7] add-interactive.c: implement status command Daniel Ferreira via GitGitGadget
2019-01-21  9:13     ` [PATCH v3 2/7] add--helper: create builtin helper for interactive add Daniel Ferreira via GitGitGadget
2019-01-21  9:13     ` [PATCH v3 4/7] add--interactive.perl: use add--helper --status for status_cmd Daniel Ferreira via GitGitGadget
2019-01-21  9:13     ` [PATCH v3 5/7] add-interactive.c: implement show-help command Slavica Djukic via GitGitGadget
2019-01-21  9:13     ` [PATCH v3 6/7] t3701-add-interactive: test add_i_show_help() Slavica Djukic via GitGitGadget
2019-01-25 11:01       ` Phillip Wood
2019-01-25 11:36         ` Slavica Djukic
2019-01-21  9:13     ` [PATCH v3 7/7] add--interactive.perl: use add--helper --show-help for help_cmd Slavica Djukic via GitGitGadget
2019-01-21  9:59       ` Ævar Arnfjörð Bjarmason
2019-01-21 11:59         ` Slavica Djukic
2019-01-25 12:23     ` [PATCH v4 0/7] Turn git add-i into built-in Slavica Đukić via GitGitGadget
2019-01-25 12:23       ` [PATCH v4 1/7] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-01-25 12:23       ` [PATCH v4 2/7] add--helper: create builtin helper for interactive add Daniel Ferreira via GitGitGadget
2019-01-25 12:23       ` [PATCH v4 3/7] add-interactive.c: implement status command Daniel Ferreira via GitGitGadget
2019-01-25 12:23       ` [PATCH v4 4/7] add--interactive.perl: use add--helper --status for status_cmd Daniel Ferreira via GitGitGadget
2019-01-25 12:23       ` [PATCH v4 6/7] t3701-add-interactive: test add_i_show_help() Slavica Djukic via GitGitGadget
2019-01-25 12:23       ` [PATCH v4 5/7] add-interactive.c: implement show-help command Slavica Djukic via GitGitGadget
2019-01-25 12:23       ` [PATCH v4 7/7] add--interactive.perl: use add--helper --show-help for help_cmd Slavica Djukic via GitGitGadget
2019-01-25 12:37       ` [PATCH v4 0/7] Turn git add-i into built-in Slavica Djukic
2019-02-01 14:37         ` Phillip Wood
2019-02-20 11:41       ` [PATCH v5 00/10] " Slavica Đukić via GitGitGadget
2019-02-20 11:41         ` [PATCH v5 02/10] add--helper: create builtin helper for interactive add Daniel Ferreira via GitGitGadget
2019-02-21 17:56           ` Junio C Hamano
2019-03-08 20:48             ` Johannes Schindelin
2019-02-20 11:41         ` [PATCH v5 01/10] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-02-21 17:53           ` Junio C Hamano
2019-02-22  9:03             ` Slavica Djukic
2019-02-20 11:41         ` [PATCH v5 03/10] add-interactive.c: implement list_modified Slavica Djukic via GitGitGadget
2019-02-21 19:06           ` Junio C Hamano [this message]
2019-02-21 20:27             ` Junio C Hamano
2019-02-22 12:18               ` Slavica Djukic
2019-02-22 11:35             ` Slavica Djukic
2019-02-20 11:41         ` [PATCH v5 04/10] add-interactive.c: implement list_and_choose Slavica Djukic via GitGitGadget
2019-02-22 21:46           ` Junio C Hamano
2019-03-01 11:20             ` Slavica Djukic
2019-02-20 11:41         ` [PATCH v5 05/10] add-interactive.c: implement status command Slavica Djukic via GitGitGadget
2019-02-22 22:11           ` Junio C Hamano
2019-03-01 11:08             ` Slavica Djukic
2019-02-20 11:41         ` [PATCH v5 06/10] add--interactive.perl: use add--helper --status for status_cmd Daniel Ferreira via GitGitGadget
2019-02-20 11:41         ` [PATCH v5 07/10] add-interactive.c: add support for list_only option Slavica Djukic via GitGitGadget
2019-02-20 11:41         ` [PATCH v5 08/10] add-interactive.c: implement show-help command Slavica Djukic via GitGitGadget
2019-02-20 11:41         ` [PATCH v5 09/10] t3701-add-interactive: test add_i_show_help() Slavica Djukic via GitGitGadget
2019-02-20 11:41         ` [PATCH v5 10/10] add--interactive.perl: use add--helper --show-help for help_cmd Slavica Djukic via GitGitGadget
2019-02-22  4:53         ` [PATCH v5 00/10] Turn git add-i into built-in Junio C Hamano
2019-03-04 10:49         ` End of Outreachy internship Slavica Djukic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqtvgxt0ze.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=slawica92@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).