From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Slavica Djukic via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Slavica Djukic <slawica92@hotmail.com>
Subject: Re: [PATCH 07/11] Add a function to determine unique prefixes for a list of strings
Date: Mon, 13 May 2019 14:48:18 +0200 (CEST) [thread overview]
Message-ID: <nycvar.QRO.7.76.6.1905131405210.44@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <1be9b470-6daa-a50e-ba3a-432520721b0f@jeffhostetler.com>
Hi Jeff,
On Thu, 18 Apr 2019, Jeff Hostetler wrote:
> On 4/10/2019 1:37 PM, Slavica Djukic via GitGitGadget wrote:
> > From: Slavica Djukic <slawica92@hotmail.com>
> >
> > In the `git add -i` command, we show unique prefixes of the commands and
> > files, to give an indication what prefix would select them.
> >
> > Naturally, the C implementation looks a lot different than the Perl
> > implementation: in Perl, a trie is much easier implemented, while we
> > already have a pretty neat hashmap implementation in C that we use for
> > the purpose of storing (not necessarily unique) prefixes.
> >
> > The idea: for each item that we add, we generate prefixes starting with
> > the first letter, then the first two letters, then three, etc, until we
> > find a prefix that is unique (or until the prefix length would be
> > longer than we want). If we encounter a previously-unique prefix on the
> > way, we adjust that item's prefix to make it unique again (or we mark it
> > as having no unique prefix if we failed to find one). These partial
> > prefixes are stored in a hash map (for quick lookup times).
> >
> > To make sure that this function works as expected, we add a test using a
> > special-purpose test helper that was added for that purpose.
> >
> > Note: We expect the list of prefix items to be passed in as a list of
> > pointers rather than as regular list to avoid having to copy information
> > (the actual items will most likely contain more information than just
> > the name and the length of the unique prefix, but passing in `struct
> > prefix_item *` would not allow for that).
> >
> > Signed-off-by: Slavica Djukic <slawica92@hotmail.com>
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/prefix-map.c b/prefix-map.c
> > new file mode 100644
> > index 0000000000..3c5ae4ae0a
> > --- /dev/null
> > +++ b/prefix-map.c
> > @@ -0,0 +1,111 @@
> > +#include "cache.h"
> > +#include "prefix-map.h"
> > +
> > +static int map_cmp(const void *unused_cmp_data,
> > + const void *entry,
> > + const void *entry_or_key,
> > + const void *unused_keydata)
> > +{
> > + const struct prefix_map_entry *a = entry;
> > + const struct prefix_map_entry *b = entry_or_key;
> > +
> > + return a->prefix_length != b->prefix_length ||
> > + strncmp(a->name, b->name, a->prefix_length);
> > +}
> > +
> > +static void add_prefix_entry(struct hashmap *map, const char *name,
> > + size_t prefix_length, struct prefix_item *item)
> > +{
> > + struct prefix_map_entry *result = xmalloc(sizeof(*result));
> > + result->name = name;
> > + result->prefix_length = prefix_length;
> > + result->item = item;
> > + hashmap_entry_init(result, memhash(name, prefix_length));
> > + hashmap_add(map, result);
> > +}
> > +
> > +static void init_prefix_map(struct prefix_map *prefix_map,
> > + int min_prefix_length, int max_prefix_length)
> > +{
> > + hashmap_init(&prefix_map->map, map_cmp, NULL, 0);
> > + prefix_map->min_length = min_prefix_length;
> > + prefix_map->max_length = max_prefix_length;
> > +}
> > +
> > +static void add_prefix_item(struct prefix_map *prefix_map,
> > + struct prefix_item *item)
> > +{
> > + struct prefix_map_entry *e = xmalloc(sizeof(*e)), *e2;
> > + int j;
> > +
> > + e->item = item;
> > + e->name = e->item->name;
> > +
> > + for (j = prefix_map->min_length; j <= prefix_map->max_length; j++) {
> > + if (!isascii(e->name[j])) {
>
> This feels odd, if I understand the intent.
>
> First, why "isascii()" rather than just non-zero?
That's to imitate `git-add--interactive.perl`'s
if (ord($letters[0]) > 127 ||
($soft_limit && $j + 1 > $soft_limit))
See https://github.com/git/git/blob/v2.21.0/git-add--interactive.perl#L410
for more complete context.
I think the main benefit here is that we avoid running into the trap of
using incomplete UTF-8 multi-byte sequences in prefixes.
I guess we could throw in an extra safety on the C side by excluding
control characters, too. But that would be a deviation from Perl, and I
actually do not even feel strongly about excluding, say, a HT (horizontal
tab) from the prefixes.
> But mainly, can we walk off the end of the array and read
> potentially uninitialized memory? Shouldn't we have something
> at the top of the function like:
>
> len = strlen(item->name);
> if (len < prefix_map->min_length)
> return;
Ooops, you're right. But I would not use `strlen() here, we can easily
just add `&& e->name[j]` to the loop condition.
> (And maybe avoid the xmalloc() too?)
Hmm. At first, I thought: no, we use `*e` *both* for lookup and for adding
a new item once we did not find any existing for the current prefix
length.
But it does indeed become a lot clearer when I separate those. It's not
even performance or memory critical a code path.
> And maybe do " j <= min(len, max_length) " in the loop?
> But I see you're modifying "j" down in the body of the loop,
> so I'll wait on suggesting that.
>
> > + free(e);
> > + break;
> > + }
> > +
> > + e->prefix_length = j;
> > + hashmap_entry_init(e, memhash(e->name, j));
> > + e2 = hashmap_get(&prefix_map->map, e, NULL);
> > + if (!e2) {
> > + /* prefix is unique so far */
> > + e->item->prefix_length = j;
> > + hashmap_add(&prefix_map->map, e);
> > + break;
> > + }
> > +
> > + if (!e2->item)
> > + continue; /* non-unique prefix */
> > +
> > + if (j != e2->item->prefix_length)
> > + BUG("unexpected prefix length: %d != %d",
> > + (int)j, (int)e2->item->prefix_length);
>
> IIUC, this assurance comes directly from map_cmp(), right?
> We could strengthen this to
> (j != e2->item->prefix_length || strncmp(...))
> if we wanted to, right?
Right, I'll actually go for `memcmp()` here, but the idea is the same.
> > +
> > + /* skip common prefix */
> > + for (; j < prefix_map->max_length && e->name[j]; j++) {
> > + if (e->item->name[j] != e2->item->name[j])
> > + break;
>
> Same comment here about walking off of the defined end of both arrays.
Actually, no, not here, as I already test for `e->name[j]` in the loop
condition. If we reach the end of `e2->item->name`, the inner condition
will break out of the loop.
> I'm going to stop here. I'm getting confused.
Oh no ;-)
Thank you for your helpful comments!
Ciao,
Dscho
next prev parent reply other threads:[~2019-05-13 12:48 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-10 17:37 [PATCH 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-04-18 14:31 ` Jeff Hostetler
2019-04-18 16:06 ` Jeff King
2019-04-30 23:40 ` Johannes Schindelin
2019-05-01 2:21 ` Jeff King
2019-05-13 11:14 ` Johannes Schindelin
2019-04-10 17:37 ` [PATCH 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-04-10 17:37 ` [PATCH 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-04-10 17:37 ` [PATCH 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-04-18 17:57 ` Jeff Hostetler
2019-05-13 12:48 ` Johannes Schindelin [this message]
2019-04-10 17:37 ` [PATCH 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-04-18 16:49 ` Jeff Hostetler
2019-05-13 12:04 ` Johannes Schindelin
2019-04-10 17:37 ` [PATCH 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-04-10 17:37 ` [PATCH 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-04-10 17:37 ` [PATCH 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-04-10 17:37 ` [PATCH 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-05-13 17:27 ` [PATCH v2 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-05-13 17:28 ` [PATCH v2 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-05-13 17:28 ` [PATCH v2 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-05-13 17:28 ` [PATCH v2 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-07-16 14:58 ` [PATCH v3 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-07-16 14:58 ` [PATCH v3 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-07-31 17:52 ` Junio C Hamano
2019-08-26 21:26 ` Johannes Schindelin
2019-08-27 22:25 ` Junio C Hamano
2019-08-28 15:06 ` Johannes Schindelin
2019-08-28 15:37 ` Junio C Hamano
2019-07-16 14:58 ` [PATCH v3 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-07-31 17:59 ` Junio C Hamano
2019-08-27 9:22 ` Johannes Schindelin
2019-07-16 14:58 ` [PATCH v3 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-07-31 18:12 ` Junio C Hamano
2019-08-27 10:04 ` Johannes Schindelin
2019-07-16 14:58 ` [PATCH v3 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-07-16 14:58 ` [PATCH v3 05/11] built-in add -i: color the header in the `status` command Johannes Schindelin via GitGitGadget
2019-07-16 14:58 ` [PATCH v3 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-07-31 18:14 ` Junio C Hamano
2019-07-16 14:58 ` [PATCH v3 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-07-16 14:58 ` [PATCH v3 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-07-31 18:39 ` Junio C Hamano
2019-08-24 12:38 ` SZEDER Gábor
2019-08-27 12:14 ` Johannes Schindelin
2019-08-28 16:30 ` SZEDER Gábor
2019-08-28 16:34 ` [PATCH] [PoC] A simpler find_unique_prefixes() implementation SZEDER Gábor
2019-08-30 20:12 ` Johannes Schindelin
2019-07-16 14:58 ` [PATCH v3 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-07-16 14:58 ` [PATCH v3 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-07-16 14:58 ` [PATCH v3 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-08-02 21:04 ` Junio C Hamano
2019-08-02 22:26 ` Jeff King
2019-07-16 18:38 ` [PATCH v3 00/11] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin
2019-08-02 21:06 ` Junio C Hamano
2019-08-27 12:57 ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2019-08-27 12:57 ` [PATCH v4 01/11] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-08-27 12:57 ` [PATCH v4 02/11] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-08-27 12:57 ` [PATCH v4 04/11] built-in add -i: refresh the index before running `status` Johannes Schindelin via GitGitGadget
2019-08-27 12:57 ` [PATCH v4 03/11] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-08-27 12:57 ` [PATCH v4 05/11] built-in add -i: color the header in " Johannes Schindelin via GitGitGadget
2019-08-27 12:57 ` [PATCH v4 07/11] Add a function to determine unique prefixes for a list of strings Slavica Djukic via GitGitGadget
2019-08-27 12:57 ` [PATCH v4 06/11] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-08-27 12:57 ` [PATCH v4 08/11] built-in add -i: show unique prefixes of the commands Slavica Djukic via GitGitGadget
2019-08-27 12:58 ` [PATCH v4 09/11] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-08-27 12:58 ` [PATCH v4 11/11] built-in add -i: implement the `help` command Johannes Schindelin via GitGitGadget
2019-08-27 12:58 ` [PATCH v4 10/11] built-in add -i: use color in the main loop Slavica Djukic via GitGitGadget
2019-11-04 12:15 ` [PATCH v5 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-11-04 12:15 ` [PATCH v5 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-08 4:49 ` Junio C Hamano
2019-11-09 11:06 ` Johannes Schindelin
2019-11-10 7:18 ` Junio C Hamano
2019-11-11 9:15 ` Johannes Schindelin
2019-11-11 12:09 ` Junio C Hamano
2019-11-12 15:03 ` Johannes Schindelin
2019-11-13 3:54 ` Junio C Hamano
2019-11-13 12:30 ` Johannes Schindelin
2019-11-13 14:01 ` Junio C Hamano
2019-11-04 12:15 ` [PATCH v5 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-08 4:56 ` Junio C Hamano
2019-11-04 12:15 ` [PATCH v5 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-08 5:01 ` Junio C Hamano
2019-11-04 12:15 ` [PATCH v5 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-04 12:15 ` [PATCH v5 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-08 5:17 ` Junio C Hamano
2019-11-09 11:21 ` Johannes Schindelin
2019-11-04 12:15 ` [PATCH v5 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-04 12:15 ` [PATCH v5 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-04 12:15 ` [PATCH v5 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-04 12:15 ` [PATCH v5 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget
2019-11-13 12:40 ` [PATCH v6 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin via GitGitGadget
2019-11-13 12:40 ` [PATCH v6 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-14 2:15 ` Junio C Hamano
2019-11-14 15:07 ` Johannes Schindelin
2019-11-15 4:35 ` Junio C Hamano
2019-11-13 12:40 ` [PATCH v6 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-13 12:40 ` [PATCH v6 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-13 12:41 ` [PATCH v6 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-13 12:41 ` [PATCH v6 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-13 12:41 ` [PATCH v6 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-13 12:41 ` [PATCH v6 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-13 12:41 ` [PATCH v6 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-13 12:41 ` [PATCH v6 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget
2019-11-13 12:46 ` [PATCH v6 0/9] git add -i: add a rudimentary version in C (supporting only status and help so far) Johannes Schindelin
2019-11-15 11:11 ` [PATCH v7 " Johannes Schindelin via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 1/9] Start to implement a built-in version of `git add --interactive` Johannes Schindelin via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 2/9] diff: export diffstat interface Daniel Ferreira via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 3/9] built-in add -i: implement the `status` command Daniel Ferreira via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 4/9] built-in add -i: color the header in " Slavica Đukić via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 5/9] built-in add -i: implement the main loop Johannes Schindelin via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 6/9] built-in add -i: show unique prefixes of the commands Johannes Schindelin via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 7/9] built-in add -i: support `?` (prompt help) Johannes Schindelin via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 8/9] built-in add -i: use color in the main loop Slavica Đukić via GitGitGadget
2019-11-15 11:11 ` [PATCH v7 9/9] built-in add -i: implement the `help` command Slavica Đukić via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.QRO.7.76.6.1905131405210.44@tvgsbejvaqbjf.bet \
--to=johannes.schindelin@gmx.de \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=slawica92@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).