git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Slavica Djukic via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Slavica Djukic <slawica92@hotmail.com>
Subject: Re: [PATCH 07/11] Add a function to determine unique prefixes for a list of strings
Date: Mon, 13 May 2019 14:48:18 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1905131405210.44@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <1be9b470-6daa-a50e-ba3a-432520721b0f@jeffhostetler.com>

Hi Jeff,

On Thu, 18 Apr 2019, Jeff Hostetler wrote:

> On 4/10/2019 1:37 PM, Slavica Djukic via GitGitGadget wrote:
> > From: Slavica Djukic <slawica92@hotmail.com>
> >
> > In the `git add -i` command, we show unique prefixes of the commands and
> > files, to give an indication what prefix would select them.
> >
> > Naturally, the C implementation looks a lot different than the Perl
> > implementation: in Perl, a trie is much easier implemented, while we
> > already have a pretty neat hashmap implementation in C that we use for
> > the purpose of storing (not necessarily unique) prefixes.
> >
> > The idea: for each item that we add, we generate prefixes starting with
> > the first letter, then the first two letters, then three, etc, until we
> > find a prefix that is unique (or until the prefix length would be
> > longer than we want). If we encounter a previously-unique prefix on the
> > way, we adjust that item's prefix to make it unique again (or we mark it
> > as having no unique prefix if we failed to find one). These partial
> > prefixes are stored in a hash map (for quick lookup times).
> >
> > To make sure that this function works as expected, we add a test using a
> > special-purpose test helper that was added for that purpose.
> >
> > Note: We expect the list of prefix items to be passed in as a list of
> > pointers rather than as regular list to avoid having to copy information
> > (the actual items will most likely contain more information than just
> > the name and the length of the unique prefix, but passing in `struct
> > prefix_item *` would not allow for that).
> >
> > Signed-off-by: Slavica Djukic <slawica92@hotmail.com>
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/prefix-map.c b/prefix-map.c
> > new file mode 100644
> > index 0000000000..3c5ae4ae0a
> > --- /dev/null
> > +++ b/prefix-map.c
> > @@ -0,0 +1,111 @@
> > +#include "cache.h"
> > +#include "prefix-map.h"
> > +
> > +static int map_cmp(const void *unused_cmp_data,
> > +		   const void *entry,
> > +		   const void *entry_or_key,
> > +		   const void *unused_keydata)
> > +{
> > +	const struct prefix_map_entry *a = entry;
> > +	const struct prefix_map_entry *b = entry_or_key;
> > +
> > +	return a->prefix_length != b->prefix_length ||
> > +		strncmp(a->name, b->name, a->prefix_length);
> > +}
> > +
> > +static void add_prefix_entry(struct hashmap *map, const char *name,
> > +			     size_t prefix_length, struct prefix_item *item)
> > +{
> > +	struct prefix_map_entry *result = xmalloc(sizeof(*result));
> > +	result->name = name;
> > +	result->prefix_length = prefix_length;
> > +	result->item = item;
> > +	hashmap_entry_init(result, memhash(name, prefix_length));
> > +	hashmap_add(map, result);
> > +}
> > +
> > +static void init_prefix_map(struct prefix_map *prefix_map,
> > +			    int min_prefix_length, int max_prefix_length)
> > +{
> > +	hashmap_init(&prefix_map->map, map_cmp, NULL, 0);
> > +	prefix_map->min_length = min_prefix_length;
> > +	prefix_map->max_length = max_prefix_length;
> > +}
> > +
> > +static void add_prefix_item(struct prefix_map *prefix_map,
> > +			    struct prefix_item *item)
> > +{
> > +	struct prefix_map_entry *e = xmalloc(sizeof(*e)), *e2;
> > +	int j;
> > +
> > +	e->item = item;
> > +	e->name = e->item->name;
> > +
> > +	for (j = prefix_map->min_length; j <= prefix_map->max_length; j++) {
> > +		if (!isascii(e->name[j])) {
>
> This feels odd, if I understand the intent.
>
> First, why "isascii()" rather than just non-zero?

That's to imitate `git-add--interactive.perl`'s

	if (ord($letters[0]) > 127 ||
	    ($soft_limit && $j + 1 > $soft_limit))

See https://github.com/git/git/blob/v2.21.0/git-add--interactive.perl#L410
for more complete context.

I think the main benefit here is that we avoid running into the trap of
using incomplete UTF-8 multi-byte sequences in prefixes.

I guess we could throw in an extra safety on the C side by excluding
control characters, too. But that would be a deviation from Perl, and I
actually do not even feel strongly about excluding, say, a HT (horizontal
tab) from the prefixes.

> But mainly, can we walk off the end of the array and read
> potentially uninitialized memory?  Shouldn't we have something
> at the top of the function like:
>
>     len = strlen(item->name);
>     if (len < prefix_map->min_length)
>         return;

Ooops, you're right. But I would not use `strlen() here, we can easily
just add `&& e->name[j]` to the loop condition.

> (And maybe avoid the xmalloc() too?)

Hmm. At first, I thought: no, we use `*e` *both* for lookup and for adding
a new item once we did not find any existing for the current prefix
length.

But it does indeed become a lot clearer when I separate those. It's not
even performance or memory critical a code path.

> And maybe do " j <= min(len, max_length) " in the loop?
> But I see you're modifying "j" down in the body of the loop,
> so I'll wait on suggesting that.
>
> > +			free(e);
> > +			break;
> > +		}
> > +
> > +		e->prefix_length = j;
> > +		hashmap_entry_init(e, memhash(e->name, j));
> > +		e2 = hashmap_get(&prefix_map->map, e, NULL);
> > +		if (!e2) {
> > +			/* prefix is unique so far */
> > +			e->item->prefix_length = j;
> > +			hashmap_add(&prefix_map->map, e);
> > +			break;
> > +		}
> > +
> > +		if (!e2->item)
> > +			continue; /* non-unique prefix */
> > +
> > +		if (j != e2->item->prefix_length)
> > +			BUG("unexpected prefix length: %d != %d",
> > +			    (int)j, (int)e2->item->prefix_length);
>
> IIUC, this assurance comes directly from map_cmp(), right?
> We could strengthen this to
>      (j != e2->item->prefix_length || strncmp(...))
> if we wanted to, right?

Right, I'll actually go for `memcmp()` here, but the idea is the same.

> > +
> > +		/* skip common prefix */
> > +		for (; j < prefix_map->max_length && e->name[j]; j++) {
> > +			if (e->item->name[j] != e2->item->name[j])
> > +				break;
>
> Same comment here about walking off of the defined end of both arrays.

Actually, no, not here, as I already test for `e->name[j]` in the loop
condition. If we reach the end of `e2->item->name`, the inner condition
will break out of the loop.

> I'm going to stop here.  I'm getting confused.

Oh no ;-)

Thank you for your helpful comments!

Ciao,
Dscho

  reply	other threads:[~2019-05-13 12:48 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 17:37 [PATCH 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-04-18 14:31   ` Jeff Hostetler
2019-04-18 16:06     ` Jeff King
2019-04-30 23:40       ` Johannes Schindelin
2019-05-01  2:21         ` Jeff King
2019-05-13 11:14           ` Johannes Schindelin
2019-04-10 17:37 ` [PATCH 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-04-10 17:37 ` [PATCH 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-04-10 17:37 ` [PATCH 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-04-18 16:49   ` Jeff Hostetler
2019-05-13 12:04     ` Johannes Schindelin
2019-04-10 17:37 ` [PATCH 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-04-18 17:57   ` Jeff Hostetler
2019-05-13 12:48     ` Johannes Schindelin [this message]
2019-04-10 17:37 ` [PATCH 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-04-10 17:37 ` [PATCH 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-04-10 17:37 ` [PATCH 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-05-13 17:27   ` [PATCH v2 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-05-13 17:28   ` [PATCH v2 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-05-13 17:28   ` [PATCH v2 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-05-13 17:28   ` [PATCH v2 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-07-16 14:58   ` [PATCH v3 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-07-31 17:52       ` Junio C Hamano
2019-08-26 21:26         ` Johannes Schindelin
2019-08-27 22:25           ` Junio C Hamano
2019-08-28 15:06             ` Johannes Schindelin
2019-08-28 15:37               ` Junio C Hamano
2019-07-16 14:58     ` [PATCH v3 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-07-31 17:59       ` Junio C Hamano
2019-08-27  9:22         ` Johannes Schindelin
2019-07-16 14:58     ` [PATCH v3 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-07-31 18:12       ` Junio C Hamano
2019-08-27 10:04         ` Johannes Schindelin
2019-07-16 14:58     ` [PATCH v3 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-07-31 18:14       ` Junio C Hamano
2019-07-16 14:58     ` [PATCH v3 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-07-31 18:39       ` Junio C Hamano
2019-08-24 12:38       ` SZEDER Gábor
2019-08-27 12:14         ` Johannes Schindelin
2019-08-28 16:30           ` SZEDER Gábor
2019-08-28 16:34             ` [PATCH] [PoC] A simpler find_unique_prefixes() implementation SZEDER Gábor
2019-08-30 20:12               ` Johannes Schindelin
2019-07-16 14:58     ` [PATCH v3 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-07-16 14:58     ` [PATCH v3 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-08-02 21:04       ` Junio C Hamano
2019-08-02 22:26         ` Jeff King
2019-07-16 18:38     ` [PATCH v3 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin
2019-08-02 21:06       ` Junio C Hamano
2019-08-27 12:57     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-08-27 12:57       ` [PATCH v4 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-08-27 12:58       ` [PATCH v4 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-08-27 12:58       ` [PATCH v4 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-08-27 12:58       ` [PATCH v4 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-11-04 12:15       ` [PATCH v5 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-08  4:49           ` Junio C Hamano
2019-11-09 11:06             ` Johannes Schindelin
2019-11-10  7:18               ` Junio C Hamano
2019-11-11  9:15                 ` Johannes Schindelin
2019-11-11 12:09                   ` Junio C Hamano
2019-11-12 15:03                     ` Johannes Schindelin
2019-11-13  3:54                       ` Junio C Hamano
2019-11-13 12:30                         ` Johannes Schindelin
2019-11-13 14:01                           ` Junio C Hamano
2019-11-04 12:15         ` [PATCH v5 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-08  4:56           ` Junio C Hamano
2019-11-04 12:15         ` [PATCH v5 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-08  5:01           ` Junio C Hamano
2019-11-04 12:15         ` [PATCH v5 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-08  5:17           ` Junio C Hamano
2019-11-09 11:21             ` Johannes Schindelin
2019-11-04 12:15         ` [PATCH v5 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-04 12:15         ` [PATCH v5 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget
2019-11-13 12:40         ` [PATCH v6 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-11-13 12:40           ` [PATCH v6 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-14  2:15             ` Junio C Hamano
2019-11-14 15:07               ` Johannes Schindelin
2019-11-15  4:35                 ` Junio C Hamano
2019-11-13 12:40           ` [PATCH v6 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-13 12:40           ` [PATCH v6 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-13 12:41           ` [PATCH v6 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget
2019-11-13 12:46           ` [PATCH v6 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin
2019-11-15 11:11           ` [PATCH v7 " Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-15 11:11             ` [PATCH v7 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1905131405210.44@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=slawica92@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).