git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Use string_lists when processing notes
@ 2012-11-04  7:07 Michael Haggerty
  2012-11-04  7:07 ` [PATCH 1/5] string_list: add a function string_list_remove_empty_items() Michael Haggerty
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Michael Haggerty @ 2012-11-04  7:07 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty

This simplifies the code.  Also, sort lines all at once (O(N lg N))
rather than insertion sorting as lines are processed (O(N^2)) and fix
the handling of empty values in GIT_NOTES_DISPLAY_REF and
GIT_NOTES_REWRITE_REF.

Michael Haggerty (5):
  string_list: add a function string_list_remove_empty_items()
  Initialize sort_uniq_list using named constant
  combine_notes_cat_sort_uniq(): sort and dedup lines all at once
  notes: fix handling of colon-separated values
  string_list_add_refs_from_colon_sep(): use string_list_split()

 Documentation/technical/api-string-list.txt |  9 ++++-
 notes.c                                     | 61 ++++++++++++-----------------
 string-list.c                               |  9 +++++
 string-list.h                               |  7 ++++
 4 files changed, 49 insertions(+), 37 deletions(-)

-- 
1.8.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] string_list: add a function string_list_remove_empty_items()
  2012-11-04  7:07 [PATCH 0/5] Use string_lists when processing notes Michael Haggerty
@ 2012-11-04  7:07 ` Michael Haggerty
  2012-11-04  7:07 ` [PATCH 2/5] Initialize sort_uniq_list using named constant Michael Haggerty
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Michael Haggerty @ 2012-11-04  7:07 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Documentation/technical/api-string-list.txt | 9 ++++++++-
 string-list.c                               | 9 +++++++++
 string-list.h                               | 7 +++++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt
index 94d7a2b..7386bca 100644
--- a/Documentation/technical/api-string-list.txt
+++ b/Documentation/technical/api-string-list.txt
@@ -38,7 +38,8 @@ member (you need this if you add things later) and you should set the
   `unsorted_string_list_delete_item`.
 
 . Can remove items not matching a criterion from a sorted or unsorted
-  list using `filter_string_list`.
+  list using `filter_string_list`, or remove empty strings using
+  `string_list_remove_empty_items`.
 
 . Finally it should free the list using `string_list_clear`.
 
@@ -75,6 +76,12 @@ Functions
 	to be deleted.  Preserve the order of the items that are
 	retained.
 
+`string_list_remove_empty_items`::
+
+	Remove any empty strings from the list.  If free_util is true,
+	call free() on the util members of any items that have to be
+	deleted.  Preserve the order of the items that are retained.
+
 `string_list_longest_prefix`::
 
 	Return the longest string within a string_list that is a
diff --git a/string-list.c b/string-list.c
index c54b816..397e6cf 100644
--- a/string-list.c
+++ b/string-list.c
@@ -136,6 +136,15 @@ void filter_string_list(struct string_list *list, int free_util,
 	list->nr = dst;
 }
 
+static int item_is_not_empty(struct string_list_item *item, void *unused)
+{
+	return *item->string != '\0';
+}
+
+void string_list_remove_empty_items(struct string_list *list, int free_util) {
+	filter_string_list(list, free_util, item_is_not_empty, NULL);
+}
+
 char *string_list_longest_prefix(const struct string_list *prefixes,
 				 const char *string)
 {
diff --git a/string-list.h b/string-list.h
index 5efd07b..c50b0d0 100644
--- a/string-list.h
+++ b/string-list.h
@@ -39,6 +39,13 @@ void filter_string_list(struct string_list *list, int free_util,
 			string_list_each_func_t want, void *cb_data);
 
 /*
+ * Remove any empty strings from the list.  If free_util is true, call
+ * free() on the util members of any items that have to be deleted.
+ * Preserve the order of the items that are retained.
+ */
+void string_list_remove_empty_items(struct string_list *list, int free_util);
+
+/*
  * Return the longest string in prefixes that is a prefix (in the
  * sense of prefixcmp()) of string, or NULL if no such prefix exists.
  * This function does not require the string_list to be sorted (it
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] Initialize sort_uniq_list using named constant
  2012-11-04  7:07 [PATCH 0/5] Use string_lists when processing notes Michael Haggerty
  2012-11-04  7:07 ` [PATCH 1/5] string_list: add a function string_list_remove_empty_items() Michael Haggerty
@ 2012-11-04  7:07 ` Michael Haggerty
  2012-11-04  7:07 ` [PATCH 3/5] combine_notes_cat_sort_uniq(): sort and dedup lines all at once Michael Haggerty
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Michael Haggerty @ 2012-11-04  7:07 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes.c b/notes.c
index bc454e1..8652f8f 100644
--- a/notes.c
+++ b/notes.c
@@ -901,7 +901,7 @@ static int string_list_join_lines_helper(struct string_list_item *item,
 int combine_notes_cat_sort_uniq(unsigned char *cur_sha1,
 		const unsigned char *new_sha1)
 {
-	struct string_list sort_uniq_list = { NULL, 0, 0, 1 };
+	struct string_list sort_uniq_list = STRING_LIST_INIT_DUP;
 	struct strbuf buf = STRBUF_INIT;
 	int ret = 1;
 
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] combine_notes_cat_sort_uniq(): sort and dedup lines all at once
  2012-11-04  7:07 [PATCH 0/5] Use string_lists when processing notes Michael Haggerty
  2012-11-04  7:07 ` [PATCH 1/5] string_list: add a function string_list_remove_empty_items() Michael Haggerty
  2012-11-04  7:07 ` [PATCH 2/5] Initialize sort_uniq_list using named constant Michael Haggerty
@ 2012-11-04  7:07 ` Michael Haggerty
  2012-11-04  7:07 ` [PATCH 4/5] notes: fix handling of colon-separated values Michael Haggerty
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Michael Haggerty @ 2012-11-04  7:07 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty

Instead of reading lines one by one and insertion-sorting them into a
string_list, read all of the lines, sort them, then remove duplicates.
Aside from being less code, this reduces the complexity from O(N^2) to
O(N lg N) in the total number of lines.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/notes.c b/notes.c
index 8652f8f..62f8f6f 100644
--- a/notes.c
+++ b/notes.c
@@ -848,15 +848,16 @@ int combine_notes_ignore(unsigned char *cur_sha1,
 	return 0;
 }
 
-static int string_list_add_note_lines(struct string_list *sort_uniq_list,
+/*
+ * Add the lines from the named object to list, with trailing
+ * newlines removed.
+ */
+static int string_list_add_note_lines(struct string_list *list,
 				      const unsigned char *sha1)
 {
 	char *data;
 	unsigned long len;
 	enum object_type t;
-	struct strbuf buf = STRBUF_INIT;
-	struct strbuf **lines = NULL;
-	int i, list_index;
 
 	if (is_null_sha1(sha1))
 		return 0;
@@ -868,24 +869,14 @@ static int string_list_add_note_lines(struct string_list *sort_uniq_list,
 		return t != OBJ_BLOB || !data;
 	}
 
-	strbuf_attach(&buf, data, len, len + 1);
-	lines = strbuf_split(&buf, '\n');
-
-	for (i = 0; lines[i]; i++) {
-		if (lines[i]->buf[lines[i]->len - 1] == '\n')
-			strbuf_setlen(lines[i], lines[i]->len - 1);
-		if (!lines[i]->len)
-			continue; /* skip empty lines */
-		list_index = string_list_find_insert_index(sort_uniq_list,
-							   lines[i]->buf, 0);
-		if (list_index < 0)
-			continue; /* skip duplicate lines */
-		string_list_insert_at_index(sort_uniq_list, list_index,
-					    lines[i]->buf);
-	}
-
-	strbuf_list_free(lines);
-	strbuf_release(&buf);
+	/*
+	 * If the last line of the file is EOL-terminated, this will
+	 * add an empty string to the list.  But it will be removed
+	 * later, along with any empty strings that came from empty
+	 * lines within the file.
+	 */
+	string_list_split(list, data, '\n', -1);
+	free(data);
 	return 0;
 }
 
@@ -910,6 +901,9 @@ int combine_notes_cat_sort_uniq(unsigned char *cur_sha1,
 		goto out;
 	if (string_list_add_note_lines(&sort_uniq_list, new_sha1))
 		goto out;
+	string_list_remove_empty_items(&sort_uniq_list, 0);
+	sort_string_list(&sort_uniq_list);
+	string_list_remove_duplicates(&sort_uniq_list, 0);
 
 	/* create a new blob object from sort_uniq_list */
 	if (for_each_string_list(&sort_uniq_list,
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] notes: fix handling of colon-separated values
  2012-11-04  7:07 [PATCH 0/5] Use string_lists when processing notes Michael Haggerty
                   ` (2 preceding siblings ...)
  2012-11-04  7:07 ` [PATCH 3/5] combine_notes_cat_sort_uniq(): sort and dedup lines all at once Michael Haggerty
@ 2012-11-04  7:07 ` Michael Haggerty
  2012-11-04  7:07 ` [PATCH 5/5] string_list_add_refs_from_colon_sep(): use string_list_split() Michael Haggerty
       [not found] ` <CALKQrgebzH5vJUQVNxTks0Nq_3OZBWrb-cLDkABxnGJJqfB7gQ@mail.gmail.com>
  5 siblings, 0 replies; 8+ messages in thread
From: Michael Haggerty @ 2012-11-04  7:07 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty

The substrings output by strbuf_split() include the ':' delimiters.
When processing GIT_NOTES_DISPLAY_REF and GIT_NOTES_REWRITE_REF, strip
off the delimiter character *before* checking whether the substring is
empty rather than after, so that empty strings within the list are
also skipped.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notes.c b/notes.c
index 62f8f6f..63b2a09 100644
--- a/notes.c
+++ b/notes.c
@@ -951,10 +951,10 @@ void string_list_add_refs_from_colon_sep(struct string_list *list,
 	split = strbuf_split(&globbuf, ':');
 
 	for (i = 0; split[i]; i++) {
+		if (split[i]->len && split[i]->buf[split[i]->len-1] == ':')
+			strbuf_setlen(split[i], split[i]->len-1);
 		if (!split[i]->len)
 			continue;
-		if (split[i]->buf[split[i]->len-1] == ':')
-			strbuf_setlen(split[i], split[i]->len-1);
 		string_list_add_refs_by_glob(list, split[i]->buf);
 	}
 
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] string_list_add_refs_from_colon_sep(): use string_list_split()
  2012-11-04  7:07 [PATCH 0/5] Use string_lists when processing notes Michael Haggerty
                   ` (3 preceding siblings ...)
  2012-11-04  7:07 ` [PATCH 4/5] notes: fix handling of colon-separated values Michael Haggerty
@ 2012-11-04  7:07 ` Michael Haggerty
  2012-11-04 11:57   ` Jeff King
       [not found] ` <CALKQrgebzH5vJUQVNxTks0Nq_3OZBWrb-cLDkABxnGJJqfB7gQ@mail.gmail.com>
  5 siblings, 1 reply; 8+ messages in thread
From: Michael Haggerty @ 2012-11-04  7:07 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty

It makes for simpler code than strbuf_split().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/notes.c b/notes.c
index 63b2a09..b823701 100644
--- a/notes.c
+++ b/notes.c
@@ -943,23 +943,18 @@ void string_list_add_refs_by_glob(struct string_list *list, const char *glob)
 void string_list_add_refs_from_colon_sep(struct string_list *list,
 					 const char *globs)
 {
-	struct strbuf globbuf = STRBUF_INIT;
-	struct strbuf **split;
+	struct string_list split = STRING_LIST_INIT_NODUP;
+	char *globs_copy = xstrdup(globs);
 	int i;
 
-	strbuf_addstr(&globbuf, globs);
-	split = strbuf_split(&globbuf, ':');
+	string_list_split_in_place(&split, globs_copy, ':', -1);
+	string_list_remove_empty_items(&split, 0);
 
-	for (i = 0; split[i]; i++) {
-		if (split[i]->len && split[i]->buf[split[i]->len-1] == ':')
-			strbuf_setlen(split[i], split[i]->len-1);
-		if (!split[i]->len)
-			continue;
-		string_list_add_refs_by_glob(list, split[i]->buf);
-	}
+	for (i = 0; i < split.nr; i++)
+		string_list_add_refs_by_glob(list, split.items[i].string);
 
-	strbuf_list_free(split);
-	strbuf_release(&globbuf);
+	string_list_clear(&split, 0);
+	free(globs_copy);
 }
 
 static int notes_display_config(const char *k, const char *v, void *cb)
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 5/5] string_list_add_refs_from_colon_sep(): use string_list_split()
  2012-11-04  7:07 ` [PATCH 5/5] string_list_add_refs_from_colon_sep(): use string_list_split() Michael Haggerty
@ 2012-11-04 11:57   ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2012-11-04 11:57 UTC (permalink / raw
  To: Michael Haggerty; +Cc: Junio C Hamano, git

On Sun, Nov 04, 2012 at 08:07:10AM +0100, Michael Haggerty wrote:

> It makes for simpler code than strbuf_split().

Agreed.

I wonder how useful the strbuf_split functions really are. Callers might
care about splitting content in a strbuf, but in general, getting a list
of strbufs out is just a hassle, and a string_list makes more sense.

It's probably not worth spending time converting them unless we happen
to be working in the area, though.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] Use string_lists when processing notes
       [not found]   ` <5098C29A.4010901@alum.mit.edu>
@ 2012-11-06 14:20     ` Johan Herland
  0 siblings, 0 replies; 8+ messages in thread
From: Johan Herland @ 2012-11-06 14:20 UTC (permalink / raw
  To: Michael Haggerty; +Cc: Git mailing list

On Tue, Nov 6, 2012 at 8:56 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 11/04/2012 10:05 PM, Johan Herland wrote:
>> On Sun, Nov 4, 2012 at 8:07 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>> This simplifies the code.  Also, sort lines all at once (O(N lg N))
>>> rather than insertion sorting as lines are processed (O(N^2)) and fix
>>> the handling of empty values in GIT_NOTES_DISPLAY_REF and
>>> GIT_NOTES_REWRITE_REF.
>>>
>>> Michael Haggerty (5):
>>>   string_list: add a function string_list_remove_empty_items()
>>>   Initialize sort_uniq_list using named constant
>>>   combine_notes_cat_sort_uniq(): sort and dedup lines all at once
>>>   notes: fix handling of colon-separated values
>>>   string_list_add_refs_from_colon_sep(): use string_list_split()
>>
>> Series looks good to me.
>>
>> Acked-by: Johan Herland <johan@herland.net>
>
> Thanks for reviewing the series.
>
> Was it intentional that you didn't CC the mailing list?

No, not at all. Sorry.

Here we go again, with the appropriate CC:

Acked-by: Johan Herland <johan@herland.net>


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-11-06 14:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-04  7:07 [PATCH 0/5] Use string_lists when processing notes Michael Haggerty
2012-11-04  7:07 ` [PATCH 1/5] string_list: add a function string_list_remove_empty_items() Michael Haggerty
2012-11-04  7:07 ` [PATCH 2/5] Initialize sort_uniq_list using named constant Michael Haggerty
2012-11-04  7:07 ` [PATCH 3/5] combine_notes_cat_sort_uniq(): sort and dedup lines all at once Michael Haggerty
2012-11-04  7:07 ` [PATCH 4/5] notes: fix handling of colon-separated values Michael Haggerty
2012-11-04  7:07 ` [PATCH 5/5] string_list_add_refs_from_colon_sep(): use string_list_split() Michael Haggerty
2012-11-04 11:57   ` Jeff King
     [not found] ` <CALKQrgebzH5vJUQVNxTks0Nq_3OZBWrb-cLDkABxnGJJqfB7gQ@mail.gmail.com>
     [not found]   ` <5098C29A.4010901@alum.mit.edu>
2012-11-06 14:20     ` [PATCH 0/5] Use string_lists when processing notes Johan Herland

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).