git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: "Daniel Ferreira (theiostream)" <bnmvco@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/3] add--interactive: add builtin helper for interactive add
Date: Mon, 8 May 2017 14:15:21 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1.1705081335400.146734@virtualbox> (raw)
In-Reply-To: <CAEA2_RKzUdSPP4bBvGiFVfNnAY3wwp+0LYriC4q5XfCP-1-F4w@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5292 bytes --]

Hi Daniel,

On Fri, 5 May 2017, Daniel Ferreira (theiostream) wrote:

> On Fri, May 5, 2017 at 7:30 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >> +static int git_add_interactive_config(const char *var,
> >
> > Not git_add_interactive__helper_config()? ;-)
> 
> I don't get if you mean this ironically (because of the verbosity) or
> if you do think this would be a good name ;P

Hehe. I meant it tongue-in-cheek.

So let me try again in my endeavor to provide *constructive* criticism,
i.e. not only pointing out what I think is suboptimal, but *also* how to
improve it in my opinion. How about add_i_config() or git_add_config()?

> >> +     for (i = 0; i < q->nr; i++) {
> >> +             struct diff_filepair *p;
> >> +             p = q->queue[i];
> >> +             diff_flush_stat(p, options, &stat);
> >> +     }
> >> +
> >> +     for (i = 0; i < stat.nr; i++) {
> >> +             int file_index = s->file_count;
> >> +             for (j = 0; j < s->file_count; j++) {
> >> +                     if (!strcmp(s->files[j].path, stat.files[i]->name)) {
> >> +                             file_index = j;
> >> +                             break;
> >> +                     }
> >> +             }
> >
> > So basically, this is looking up in a list whether we saw the file in
> > question already, and the reason we have to do that is that we run the
> > entire shebang twice, once with the worktree and once with the index.
> >
> > I wonder whether it would not make sense to switch away s->files from a
> > list to a hashmap.
> > [...]
> > BTW in the first pass, we pretty much know that we only get unique names,
> > so the entire lookup is unnecessary and will just increase the time
> > complexity from O(n) to O(n^2). So let's avoid that.
> >
> > By moving to a hashmap, you can even get the second phase down to an
> > expected O(n).
> 
> How would you go about implementing that hashmap (i.e. what should be
> the hash)? Does Git have any interface for it, or is there any example
> I can look after in the codebase?

The example Ævar pointed to seems to be pretty good (Michael Haggerty's
commits are in general excellent examples to follow):

	https://github.com/git-for-windows/git/commit/7d4558c462f0

In this case, we can even fold the added/deleted part into the struct:

	#include "hashmap.h"

	...

	struct file_stats {
		struct hashmap_entry entry;
		struct {
			uintmax_t added, deleted;
		} index, worktree;
		char name[FLEX_ARRAY];
	}

	...
		for (i = 0; i < stat.nr; i++) {
			struct file_stats *stats;
			const char *name = stat.files[i]->name;
			unsigned int hash = strhash(name);

			stats = s->phase == INDEX ? NULL :
				hashmap_get_from_hash(&map, hash, name);
			if (!stats) {
				FLEX_ALLOC_STR(stats, name, name);
				hashmap_entry_init(stats, hash);
				stats->index.added = stats->index.deleted = 0;
				stats->worktree.added =
					stats->worktree.deleted = 0;
				hashmap_add(&map, stats);
			}

			if (s->phase == INDEX) {
				stats->index.added = stat.files[i]->added;
				stats->index.deleted = stat.files[i]->deleted;
			} else {
				stats->worktree.added = stat.files[i]->added;
				stats->worktree.deleted = stat.files[i]->deleted;
			}
		}

But maybe it should simultaneously put those added entries into a growing
array, as they arrive sorted and we will want to output the entries
sorted, too.

Oh, this reminds me: you are reading the list in two phases, and each time
the entries arrive sorted, but the second time we still may append new
entries.

Maybe we need to sort the entries afterwards?

> > Apart from using PATH_MAX bytes for most likely only short names: [...]
> 
> If not PATH_MAX, what should I go for? Make it a strbuf? I tend to
> believe keeping that on the stack would be simpler and more optimal.

On the stack, no question. But I was talking about struct
add_interactive_file_status, which is allocated via malloc(), not
allocated on the stack/heap.

> > Now that I read this and remember that only WORKTREE and INDEX are
> > handled in the callback function: is there actually a use for the NONE
> > enum value?  I.e. is current_mode read out in any other context than
> > the callback function? If there is no other read, then the NONE enum
> > value is just confusing.
> 
> I just preferred to have a declared non-handled value than leave
> something undefined behind. I felt it might avoid headaches in the
> future with petty segfaults.

I found it a little harder to read, as I was puzzled about the NONE
value...

> > Why not collapse all three functions into one? It is not like they are
> > totally unrelated nor super-long.
> 
> To me it is a matter of personal preference to keep them separate. If
> there is, however, any technical or project-style-related reason to get
> them together, I'll certainly do it.

I was looking at it from a readability point of view. These functions
became very small, and the operation (which in my mind was one logical
two-phase operation) became too scattered to follow easily. This may be a
limitation of my brain, but I would have had an easier time if those three
functions were one function.

Ciao,
Johannes

  parent reply	other threads:[~2017-05-08 12:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-05 18:43 [PATCH 0/3] Port git-add--interactive.perl:status_cmd to C Daniel Ferreira
2017-05-05 18:43 ` [PATCH 1/3] diff: export diffstat interface Daniel Ferreira
2017-05-05 21:28   ` Johannes Schindelin
2017-05-05 18:43 ` [PATCH 2/3] add--interactive: add builtin helper for interactive add Daniel Ferreira
2017-05-05 20:16   ` Ævar Arnfjörð Bjarmason
2017-05-05 21:21     ` Johannes Schindelin
2017-05-05 22:09       ` Ævar Arnfjörð Bjarmason
2017-05-05 22:30   ` Johannes Schindelin
2017-05-05 22:49     ` Ævar Arnfjörð Bjarmason
2017-05-08 11:35       ` Johannes Schindelin
2017-05-05 23:13     ` Daniel Ferreira (theiostream)
2017-05-05 23:28       ` Ævar Arnfjörð Bjarmason
2017-05-08 12:15       ` Johannes Schindelin [this message]
2017-05-05 18:43 ` [PATCH 3/3] add--interactive: use add-interactive--helper for status_cmd Daniel Ferreira
2017-05-05 22:32   ` Johannes Schindelin
2017-05-05 19:31 ` [PATCH 0/3] Port git-add--interactive.perl:status_cmd to C Ævar Arnfjörð Bjarmason
2017-05-05 22:33   ` Johannes Schindelin
2017-05-05 22:35   ` Jonathan Nieder
2017-05-05 22:38 ` Johannes Schindelin
2017-05-05 23:37   ` Daniel Ferreira (theiostream)
2017-05-08 12:23     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1.1705081335400.146734@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=bnmvco@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).