git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] string-list: make compare function compatible with qsort(3)
@ 2016-12-21  9:36 René Scharfe
  2016-12-21 16:12 ` Jeff King
  2016-12-21 18:10 ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: René Scharfe @ 2016-12-21  9:36 UTC (permalink / raw
  To: Git List; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

The cmp member of struct string_list points to a comparison function
that is used for sorting and searching of list items.  It takes two
string pointers -- like strcmp(3), which is in fact the default;
cmp_items() provides a qsort(3) compatible interface by passing the
string members of two struct string_list_item pointers to cmp.

One shortcoming is that the comparison function is restricted to working
with the string members of items; util is inaccessible to it.  Another
one is that the value of cmp is passed in a global variable to
cmp_items(), making string_list_sort() non-reentrant.

Remove the intermediate layer, i.e. cmp_items(), make the comparison
functions compatible with qsort(3) and pass them pointers to full items.
This allows comparisons to also take the util member into account, and
avoids the need to pass the real comparison function to an intermediate
function, removing the need for a global function.

A downside is that comparison functions take void pointers now and each
of them needs to cast its arguments to struct string_list_item pointers,
weakening type safety and adding some repetitive code.  Programmers are
used to that, however, as that's par for the course with qsort(3).

Also two unsightly casts are added that remove the const qualifiers of
strings while building the structs to pass to the comparison function as
search keys.  It should be safe, though, as we only ever use them for
reading.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
Alternative approach to the qsort_s()-based series "string-list: make
string_list_sort() reentrant".

 Documentation/technical/api-string-list.txt |  6 +++--
 builtin/mailsplit.c                         |  5 +++-
 mailmap.c                                   |  5 ++--
 merge-recursive.c                           |  4 ++-
 string-list.c                               | 39 +++++++++++++++--------------
 string-list.h                               |  4 +--
 tmp-objdir.c                                |  4 ++-
 7 files changed, 39 insertions(+), 28 deletions(-)

diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt
index c08402b12..39eac59c7 100644
--- a/Documentation/technical/api-string-list.txt
+++ b/Documentation/technical/api-string-list.txt
@@ -205,5 +205,7 @@ Represents the list itself.
   You should not tamper with it.
 . Setting the `strdup_strings` member to 1 will strdup() the strings
   before adding them, see above.
-. The `compare_strings_fn` member is used to specify a custom compare
-  function, otherwise `strcmp()` is used as the default function.
+. The `cmp` member is used to specify a custom compare function. It has
+  the same signature as the one for qsort(1) and is passed two pointers
+  to `struct string_list_item`. If it's NULL then the `string` members
+  are compared with `strcmp(1)`; this is the default.
diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 30681681c..4e72e3128 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -147,8 +147,11 @@ static int populate_maildir_list(struct string_list *list, const char *path)
 	return ret;
 }
 
-static int maildir_filename_cmp(const char *a, const char *b)
+static int maildir_filename_cmp(const void *one, const void *two)
 {
+	const struct string_list_item *item_one = one, *item_two = two;
+	const char *a = item_one->string, *b = item_two->string;
+
 	while (*a && *b) {
 		if (isdigit(*a) && isdigit(*b)) {
 			long int na, nb;
diff --git a/mailmap.c b/mailmap.c
index c1a79c100..5290b5153 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -61,9 +61,10 @@ static void free_mailmap_entry(void *p, const char *s)
  * namemap.cmp until we know no systems that matter have such an
  * "unusual" string.h.
  */
-static int namemap_cmp(const char *a, const char *b)
+static int namemap_cmp(const void *one, const void *two)
 {
-	return strcasecmp(a, b);
+	const struct string_list_item *item_one = one, *item_two = two;
+	return strcasecmp(item_one->string, item_two->string);
 }
 
 static void add_mapping(struct string_list *map,
diff --git a/merge-recursive.c b/merge-recursive.c
index d32720944..4683ba43f 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -390,8 +390,10 @@ static struct string_list *get_unmerged(void)
 	return unmerged;
 }
 
-static int string_list_df_name_compare(const char *one, const char *two)
+static int string_list_df_name_compare(const void *a, const void *b)
 {
+	const struct string_list_item *item_a = a, *item_b = b;
+	const char *one = item_a->string, *two = item_b->string;
 	int onelen = strlen(one);
 	int twolen = strlen(two);
 	/*
diff --git a/string-list.c b/string-list.c
index 8c83cac18..c583a04ee 100644
--- a/string-list.c
+++ b/string-list.c
@@ -7,17 +7,26 @@ void string_list_init(struct string_list *list, int strdup_strings)
 	list->strdup_strings = strdup_strings;
 }
 
+static int string_list_item_strcmp(const void *one, const void *two)
+{
+	const struct string_list_item *item_one = one, *item_two = two;
+	return strcmp(item_one->string, item_two->string);
+}
+
 /* if there is no exact match, point to the index where the entry could be
  * inserted */
 static int get_entry_index(const struct string_list *list, const char *string,
 		int *exact_match)
 {
 	int left = -1, right = list->nr;
-	compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
+	compare_fn_t cmp = list->cmp ? list->cmp : string_list_item_strcmp;
+	struct string_list_item key = { NULL };
+
+	key.string = (char *)string;
 
 	while (left + 1 < right) {
 		int middle = (left + right) / 2;
-		int compare = cmp(string, list->items[middle].string);
+		int compare = cmp(&key, &list->items[middle]);
 		if (compare < 0)
 			right = middle;
 		else if (compare > 0)
@@ -94,11 +103,11 @@ struct string_list_item *string_list_lookup(struct string_list *list, const char
 
 void string_list_remove_duplicates(struct string_list *list, int free_util)
 {
+	compare_fn_t cmp = list->cmp ? list->cmp : string_list_item_strcmp;
 	if (list->nr > 1) {
 		int src, dst;
-		compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
 		for (src = dst = 1; src < list->nr; src++) {
-			if (!cmp(list->items[dst - 1].string, list->items[src].string)) {
+			if (!cmp(&list->items[dst - 1], &list->items[src])) {
 				if (list->strdup_strings)
 					free(list->items[src].string);
 				if (free_util)
@@ -211,31 +220,23 @@ struct string_list_item *string_list_append(struct string_list *list,
 			list->strdup_strings ? xstrdup(string) : (char *)string);
 }
 
-/* Yuck */
-static compare_strings_fn compare_for_qsort;
-
-/* Only call this from inside string_list_sort! */
-static int cmp_items(const void *a, const void *b)
-{
-	const struct string_list_item *one = a;
-	const struct string_list_item *two = b;
-	return compare_for_qsort(one->string, two->string);
-}
-
 void string_list_sort(struct string_list *list)
 {
-	compare_for_qsort = list->cmp ? list->cmp : strcmp;
-	QSORT(list->items, list->nr, cmp_items);
+	compare_fn_t cmp = list->cmp ? list->cmp : string_list_item_strcmp;
+	QSORT(list->items, list->nr, cmp);
 }
 
 struct string_list_item *unsorted_string_list_lookup(struct string_list *list,
 						     const char *string)
 {
 	struct string_list_item *item;
-	compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
+	compare_fn_t cmp = list->cmp ? list->cmp : string_list_item_strcmp;
+	struct string_list_item key = { NULL };
+
+	key.string = (char *)string;
 
 	for_each_string_list_item(item, list)
-		if (!cmp(string, item->string))
+		if (!cmp(&key, item))
 			return item;
 	return NULL;
 }
diff --git a/string-list.h b/string-list.h
index d3809a141..073025ddc 100644
--- a/string-list.h
+++ b/string-list.h
@@ -6,13 +6,13 @@ struct string_list_item {
 	void *util;
 };
 
-typedef int (*compare_strings_fn)(const char *, const char *);
+typedef int (*compare_fn_t)(const void *, const void *);
 
 struct string_list {
 	struct string_list_item *items;
 	unsigned int nr, alloc;
 	unsigned int strdup_strings:1;
-	compare_strings_fn cmp; /* NULL uses strcmp() */
+	compare_fn_t cmp; /* NULL uses strcmp() on ->string */
 };
 
 #define STRING_LIST_INIT_NODUP { NULL, 0, 0, 0, NULL }
diff --git a/tmp-objdir.c b/tmp-objdir.c
index 64435f23a..b6209b199 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -173,8 +173,10 @@ static int pack_copy_priority(const char *name)
 	return 4;
 }
 
-static int pack_copy_cmp(const char *a, const char *b)
+static int pack_copy_cmp(const void *one, const void *two)
 {
+	const struct string_list_item *item_one = one, *item_two = two;
+	const char *a = item_one->string, *b = item_two->string;
 	return pack_copy_priority(a) - pack_copy_priority(b);
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] string-list: make compare function compatible with qsort(3)
  2016-12-21  9:36 [PATCH] string-list: make compare function compatible with qsort(3) René Scharfe
@ 2016-12-21 16:12 ` Jeff King
  2016-12-29 13:53   ` René Scharfe
  2016-12-21 18:10 ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff King @ 2016-12-21 16:12 UTC (permalink / raw
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Wed, Dec 21, 2016 at 10:36:41AM +0100, René Scharfe wrote:

> One shortcoming is that the comparison function is restricted to working
> with the string members of items; util is inaccessible to it.  Another
> one is that the value of cmp is passed in a global variable to
> cmp_items(), making string_list_sort() non-reentrant.

I think this approach is OK for string_list, but it doesn't help the
general case that wants qsort_s() to actually access global data. I
don't know how common that is in our codebase, though.

So I'm fine with it, but I think we might eventually need to revisit the
qsort_s() thing anyway.

> Remove the intermediate layer, i.e. cmp_items(), make the comparison
> functions compatible with qsort(3) and pass them pointers to full items.
> This allows comparisons to also take the util member into account, and
> avoids the need to pass the real comparison function to an intermediate
> function, removing the need for a global function.

I'm not sure if access to the util field is really of any value, after
looking at it in:

  http://public-inbox.org/git/20161125171546.fa3zpapbjngjcl26@sigill.intra.peff.net/

Though note that if we do take this patch, there are probably one or two
spots that could switch from QSORT() to string_list_sort().

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] string-list: make compare function compatible with qsort(3)
  2016-12-21  9:36 [PATCH] string-list: make compare function compatible with qsort(3) René Scharfe
  2016-12-21 16:12 ` Jeff King
@ 2016-12-21 18:10 ` Junio C Hamano
  2016-12-21 20:47   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2016-12-21 18:10 UTC (permalink / raw
  To: René Scharfe; +Cc: Git List, Jeff King, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

> The cmp member of struct string_list points to a comparison function
> that is used for sorting and searching of list items.  It takes two
> string pointers -- like strcmp(3), which is in fact the default;
> cmp_items() provides a qsort(3) compatible interface by passing the
> string members of two struct string_list_item pointers to cmp.
>
> One shortcoming is that the comparison function is restricted to working
> with the string members of items; util is inaccessible to it.  Another
> one is that the value of cmp is passed in a global variable to
> cmp_items(), making string_list_sort() non-reentrant.

I think it is insane to make util accessible to the comparison
function in the first place.

A string-list is primarily a table of strings that can be used to
quickly look up a string in it (or detect absense of it), and
optionally set and get the data associated to that string.  If you
allow the comparison function to take anything other than the string
itself into account, you can no longer binary search unless you
force callers to specify what to put in util when a string is added
to the table, and you also remove the ability to modify util once a
string is added to the table.

The string-list API exposes the "append without sorting and then
sort before starting to look up efficiently with a binary search",
and I think that is its biggest misdesign.  Such an optimization
would have been hidden from callers in a correctly designed API by
making sure sorting happens lazily when a function that wants to see
a sorted nature of the list for the first time, but somehow we ended
up with an API with separate functions _insert() and _append() with
an explicit _sort().  It then leads to unsorted_*_lookup() and
similar mess, that imply that a string-list can be used not as a
look-up table but just an unordered bag of items.  In our attempt to
make it serve as these two quite different things, it has become
good API for neither of its two uses.  The caller is forced to know
when the list is not sorted and unsorted_* variant must be used, for
example.  "Perhaps it makes it even more flexible if we made util
available to ordering decision" is a line of thinking that makes it
even worse.

I do agree that non-reentrancy is an issue that is worth solving,
though.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] string-list: make compare function compatible with qsort(3)
  2016-12-21 18:10 ` Junio C Hamano
@ 2016-12-21 20:47   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2016-12-21 20:47 UTC (permalink / raw
  To: René Scharfe; +Cc: Git List, Jeff King, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> I do agree that non-reentrancy is an issue that is worth solving,
> though.

I sounded overly negative, but I do not think of any way other than
this patch to solve the reentrancy issue without using qsort_s().
Between the two, my preference is still the latter, but I could be
persuaded, I would guess.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] string-list: make compare function compatible with qsort(3)
  2016-12-21 16:12 ` Jeff King
@ 2016-12-29 13:53   ` René Scharfe
  0 siblings, 0 replies; 5+ messages in thread
From: René Scharfe @ 2016-12-29 13:53 UTC (permalink / raw
  To: Jeff King; +Cc: Git List, Junio C Hamano, Johannes Schindelin

Am 21.12.2016 um 17:12 schrieb Jeff King:
> On Wed, Dec 21, 2016 at 10:36:41AM +0100, René Scharfe wrote:
>
>> One shortcoming is that the comparison function is restricted to working
>> with the string members of items; util is inaccessible to it.  Another
>> one is that the value of cmp is passed in a global variable to
>> cmp_items(), making string_list_sort() non-reentrant.
>
> I think this approach is OK for string_list, but it doesn't help the
> general case that wants qsort_s() to actually access global data. I
> don't know how common that is in our codebase, though.
>
> So I'm fine with it, but I think we might eventually need to revisit the
> qsort_s() thing anyway.

I have to admit I didn't even consider the possibility that the pattern 
of accessing global variables in qsort(1) compare functions could have 
spread.

And indeed, at least ref-filter.c::compare_refs() and 
worktree.c::compare_worktree() so that as well.  The latter uses 
fspathcmp(), which is OK as ignore_case is only set once when reading 
the config, but the first one looks, well, interesting.  Perhaps a 
single ref filter per program is enough?

Anyway, that's a good enough argument for me for adding that newfangled 
C11 function after all..

Btw.: Found with

   git grep 'QSORT.*;$' |
   sed 's/.* /int /; s/);//' |
   sort -u |
   git grep -Ww -f-

and

   git grep -A1 'QSORT.*,$'

>> Remove the intermediate layer, i.e. cmp_items(), make the comparison
>> functions compatible with qsort(3) and pass them pointers to full items.
>> This allows comparisons to also take the util member into account, and
>> avoids the need to pass the real comparison function to an intermediate
>> function, removing the need for a global function.
>
> I'm not sure if access to the util field is really of any value, after
> looking at it in:
>
>   http://public-inbox.org/git/20161125171546.fa3zpapbjngjcl26@sigill.intra.peff.net/
>
> Though note that if we do take this patch, there are probably one or two
> spots that could switch from QSORT() to string_list_sort().

Yes, but as you noted in that thread there is not much point in doing 
that; the only net win is that we can pass a list as a single pointer 
instead of as base pointer and element count -- the special compare 
function needs to be specified anyway (once), somehow.

René

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-29 13:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-21  9:36 [PATCH] string-list: make compare function compatible with qsort(3) René Scharfe
2016-12-21 16:12 ` Jeff King
2016-12-29 13:53   ` René Scharfe
2016-12-21 18:10 ` Junio C Hamano
2016-12-21 20:47   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).