[PATCH 0/5] [GSOC] [RFC] ref-filter: performance optimization

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/5] [GSOC] [RFC] ref-filter: performance optimization
@ 2021-08-17  7:14 ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 1/5] [GSOC] ref-filter: skip parse_object_buffer in some cases ZheNing Hu via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-08-17  7:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu

Last time I submitted a very long patch series:
https://lore.kernel.org/git/pull.1016.git.1628842990.gitgitgadget@gmail.com/
My mentor Christian suggested to split the performance optimization part
out, so this patch series used to optimize performance optimization in
ref-filter.

Changes in this patch series:

 1. Skip parse_object_buffer(), which can reduce object content parsing.
 2. Use linked list to record parsing results, which can reduce second
    format string parsing.
 3. Reuse final buffer, which can reduce memcpy();
 4. Add a need_get_object_info flag instead of compare two object_info,
    which can reduce memcmp();
 5. Use ALLOC_ARRAY() instead of CALLOC_ARRAY(), which can reduce memset();

Since the relevant part of git cat-file --batch has been deleted, the
execution time of git for-each-ref is very short, I haven’t added
performance tests yet for git for-each-ref.

ZheNing Hu (5):
  [GSOC] ref-filter: skip parse_object_buffer in some cases
  [GSOC] ref-filter: remove second parsing in format_ref_array_item
  [GSOC] ref-filter: reuse final buffer
  [GSOC] ref-filter: reduce unnecessary object_info comparisons
  [GSOC]: ref-filter: instead CALLOC_ARRAY to ALLOC_ARRAY

 builtin/branch.c       |   2 +
 builtin/for-each-ref.c |   3 +-
 builtin/tag.c          |   2 +
 builtin/verify-tag.c   |   2 +
 ref-filter.c           | 168 ++++++++++++++++++++++++++++++++---------
 ref-filter.h           |  17 ++++-
 6 files changed, 154 insertions(+), 40 deletions(-)


base-commit: f000ecbed922c727382651490e75014f003c89ca
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1020%2Fadlternative%2Fref-filter-opt-perf-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1020/adlternative/ref-filter-opt-perf-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1020
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] [GSOC] ref-filter: skip parse_object_buffer in some cases
  2021-08-17  7:14 [PATCH 0/5] [GSOC] [RFC] ref-filter: performance optimization ZheNing Hu via GitGitGadget
@ 2021-08-17  7:14 ` ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 2/5] [GSOC] ref-filter: remove second parsing in format_ref_array_item ZheNing Hu via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-08-17  7:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

When we are using some atoms such as %(raw), %(objectname),
%(objecttype), we don't need to parse the content of the object,
so we can skip parse_object_buffer() for performance optimization.

It is worth noting that in these cases, we still need to call
parse_object_buffer() for parsing:

1. The atom type is one of %(tag), %(type), %(object), %(tree),
%(numparent) or %(parent).
2. The type of the object is tag and the atom need to be
dereferenced e.g. %(*objecttype).

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 49 +++++++++++++++++++++++++++++++++++++------------
 ref-filter.h |  5 +++--
 2 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 93ce2a6ef2e..65ba00633dc 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1009,6 +1009,20 @@ static int reject_atom(enum atom_type atom_type)
 	return atom_type == ATOM_REST;
 }
 
+static int need_parse_buffer(enum atom_type atom_type) {
+	switch (atom_type) {
+	case ATOM_TAG:
+	case ATOM_TYPE:
+	case ATOM_OBJECT:
+	case ATOM_TREE:
+	case ATOM_NUMPARENT:
+	case ATOM_PARENT:
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 /*
  * Make sure the format string is well formed, and parse out
  * the used atoms.
@@ -1029,6 +1043,8 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
+		if (need_parse_buffer(used_atom[at].atom_type))
+			format->can_skip_parse_buffer = 0;
 		if (reject_atom(used_atom[at].atom_type))
 			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
 
@@ -1524,14 +1540,16 @@ static void grab_values(struct atom_value *val, int deref, struct object *obj, s
 {
 	void *buf = data->content;
 
-	switch (obj->type) {
+	switch (data->type) {
 	case OBJ_TAG:
-		grab_tag_values(val, deref, obj);
+		if (obj)
+			grab_tag_values(val, deref, obj);
 		grab_sub_body_contents(val, deref, data);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
-		grab_commit_values(val, deref, obj);
+		if (obj)
+			grab_commit_values(val, deref, obj);
 		grab_sub_body_contents(val, deref, data);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
@@ -1757,14 +1775,21 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 		BUG("Object size is less than zero.");
 
 	if (oi->info.contentp) {
-		*obj = parse_object_buffer(the_repository, &oi->oid, oi->type, oi->size, oi->content, &eaten);
-		if (!*obj) {
-			if (!eaten)
-				free(oi->content);
-			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
-					       oid_to_hex(&oi->oid), ref->refname);
+		if (ref->can_skip_parse_buffer &&
+		    ((!deref &&
+		     (!need_tagged || oi->type != OBJ_TAG)) ||
+		    deref)) {
+			grab_values(ref->value, deref, NULL, oi);
+		} else {
+			*obj = parse_object_buffer(the_repository, &oi->oid, oi->type, oi->size, oi->content, &eaten);
+			if (!*obj) {
+				if (!eaten)
+					free(oi->content);
+				return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
+						       oid_to_hex(&oi->oid), ref->refname);
+			}
+			grab_values(ref->value, deref, *obj, oi);
 		}
-		grab_values(ref->value, deref, *obj, oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
@@ -1988,7 +2013,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	 * If there is no atom that wants to know about tagged
 	 * object, we are done.
 	 */
-	if (!need_tagged || (obj->type != OBJ_TAG))
+	if (!need_tagged || (oi.type != OBJ_TAG))
 		return 0;
 
 	/*
@@ -2595,7 +2620,7 @@ int format_ref_array_item(struct ref_array_item *info,
 
 	state.quote_style = format->quote_style;
 	push_stack_element(&state.stack);
-
+	info->can_skip_parse_buffer = format->can_skip_parse_buffer;
 	for (cp = format->format; *cp && (sp = find_next(cp)); cp = ep + 1) {
 		struct atom_value *atomv;
 		int pos;
diff --git a/ref-filter.h b/ref-filter.h
index c15dee8d6b9..5bceae1dac9 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -40,6 +40,7 @@ struct ref_array_item {
 	struct object_id objectname;
 	const char *rest;
 	int flag;
+	int can_skip_parse_buffer;
 	unsigned int kind;
 	const char *symref;
 	struct commit *commit;
@@ -81,12 +82,12 @@ struct ref_format {
 	int quote_style;
 	int use_rest;
 	int use_color;
-
+	int can_skip_parse_buffer;
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
 };
 
-#define REF_FORMAT_INIT { .use_color = -1 }
+#define REF_FORMAT_INIT { .use_color = -1, .can_skip_parse_buffer = 1 }
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] [GSOC] ref-filter: remove second parsing in format_ref_array_item
  2021-08-17  7:14 [PATCH 0/5] [GSOC] [RFC] ref-filter: performance optimization ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 1/5] [GSOC] ref-filter: skip parse_object_buffer in some cases ZheNing Hu via GitGitGadget
@ 2021-08-17  7:14 ` ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 3/5] [GSOC] ref-filter: reuse final buffer ZheNing Hu via GitGitGadget
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-08-17  7:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

We parsed format string in verify_ref_format() and stored the parsed
atom in used_atom array. But in format_ref_array_item() we have
another round of parsing format string. This affects performance.

Introducing the struct parsed_atom_list which can save the current
atom's start and end address in format string and its index in
used_atom. All parsed_atom_list entry are linked together in the
form of linked list, and the head node of the linked list is stored
in struct ref_format. Create clear_parsed_atom_list() which can used
to clear the nodes on the linked list.

This can bring performance improvement.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/branch.c       |  2 ++
 builtin/for-each-ref.c |  3 ++-
 builtin/tag.c          |  2 ++
 builtin/verify-tag.c   |  2 ++
 ref-filter.c           | 45 +++++++++++++++++++++++++++++++++---------
 ref-filter.h           | 11 +++++++++++
 6 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index b23b1d1752a..e361f8cc661 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -459,6 +459,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	strbuf_release(&err);
 	strbuf_release(&out);
 	ref_array_clear(&array);
+	clear_parsed_atom_list(&format->parsed_atom_head);
 	free(to_free);
 }
 
@@ -678,6 +679,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	memset(&filter, 0, sizeof(filter));
 	filter.kind = FILTER_REFS_BRANCHES;
 	filter.abbrev = -1;
+	INIT_LIST_HEAD(&format.parsed_atom_head);
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_branch_usage, options);
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 89cb6307d46..6e22d80d5b5 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -53,8 +53,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	memset(&array, 0, sizeof(array));
 	memset(&filter, 0, sizeof(filter));
-
 	format.format = "%(objectname) %(objecttype)\t%(refname)";
+	INIT_LIST_HEAD(&format.parsed_atom_head);
 
 	git_config(git_default_config, NULL);
 
@@ -96,6 +96,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	ref_array_clear(&array);
 	free_commit_list(filter.with_commit);
 	free_commit_list(filter.no_commit);
+	clear_parsed_atom_list(&format.parsed_atom_head);
 	UNLEAK(sorting);
 	return 0;
 }
diff --git a/builtin/tag.c b/builtin/tag.c
index 452558ec957..549339cbbe4 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -78,6 +78,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 	strbuf_release(&output);
 	ref_array_clear(&array);
 	free(to_free);
+	clear_parsed_atom_list(&format->parsed_atom_head);
 
 	return 0;
 }
@@ -493,6 +494,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	memset(&filter, 0, sizeof(filter));
 	filter.lines = -1;
 	opt.sign = -1;
+	INIT_LIST_HEAD(&format.parsed_atom_head);
 
 	argc = parse_options(argc, argv, prefix, options, git_tag_usage, 0);
 
diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
index f45136a06ba..8b0a2b2587c 100644
--- a/builtin/verify-tag.c
+++ b/builtin/verify-tag.c
@@ -39,6 +39,7 @@ int cmd_verify_tag(int argc, const char **argv, const char *prefix)
 		OPT_END()
 	};
 
+	INIT_LIST_HEAD(&format.parsed_atom_head);
 	git_config(git_verify_tag_config, NULL);
 
 	argc = parse_options(argc, argv, prefix, verify_tag_options,
@@ -73,5 +74,6 @@ int cmd_verify_tag(int argc, const char **argv, const char *prefix)
 		if (format.format)
 			pretty_print_ref(name, &oid, &format);
 	}
+	clear_parsed_atom_list(&format.parsed_atom_head);
 	return had_error;
 }
diff --git a/ref-filter.c b/ref-filter.c
index 65ba00633dc..76a31fb79b1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1035,6 +1035,7 @@ int verify_ref_format(struct ref_format *format)
 	for (cp = format->format; *cp && (sp = find_next(cp)); ) {
 		struct strbuf err = STRBUF_INIT;
 		const char *color, *ep = strchr(sp, ')');
+		struct parsed_atom_list *e;
 		int at;
 
 		if (!ep)
@@ -1043,6 +1044,12 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
+		e = xmalloc(sizeof(*e));
+		e->beg = sp + 2;
+		e->end = ep;
+		e->at = at;
+		list_add_tail(&e->list, &format->parsed_atom_head);
+
 		if (need_parse_buffer(used_atom[at].atom_type))
 			format->can_skip_parse_buffer = 0;
 		if (reject_atom(used_atom[at].atom_type))
@@ -2615,25 +2622,31 @@ int format_ref_array_item(struct ref_array_item *info,
 			  struct strbuf *final_buf,
 			  struct strbuf *error_buf)
 {
-	const char *cp, *sp, *ep;
+	const char *cp, *sp;
+	struct list_head *item;
 	struct ref_formatting_state state = REF_FORMATTING_STATE_INIT;
 
 	state.quote_style = format->quote_style;
 	push_stack_element(&state.stack);
 	info->can_skip_parse_buffer = format->can_skip_parse_buffer;
-	for (cp = format->format; *cp && (sp = find_next(cp)); cp = ep + 1) {
+
+	cp = format->format;
+
+	list_for_each(item, &format->parsed_atom_head) {
 		struct atom_value *atomv;
-		int pos;
+		struct parsed_atom_list *e =
+			list_entry(item, struct parsed_atom_list, list);
 
-		ep = strchr(sp, ')');
-		if (cp < sp)
-			append_literal(cp, sp, &state);
-		pos = parse_ref_filter_atom(format, sp + 2, ep, error_buf);
-		if (pos < 0 || get_ref_atom_value(info, pos, &atomv, error_buf) ||
+		if (cp < e->beg - 2)
+			append_literal(cp, e->beg - 2, &state);
+		if (get_ref_atom_value(info, e->at, &atomv, error_buf) ||
 		    atomv->handler(atomv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
 			return -1;
 		}
+		cp = e->end + 1;
+		if (!*cp)
+			break;
 	}
 	if (*cp) {
 		sp = cp + strlen(cp);
@@ -2681,10 +2694,12 @@ static int parse_sorting_atom(const char *atom)
 	 * This parses an atom using a dummy ref_format, since we don't
 	 * actually care about the formatting details.
 	 */
+	int res;
 	struct ref_format dummy = REF_FORMAT_INIT;
 	const char *end = atom + strlen(atom);
 	struct strbuf err = STRBUF_INIT;
-	int res = parse_ref_filter_atom(&dummy, atom, end, &err);
+
+	res = parse_ref_filter_atom(&dummy, atom, end, &err);
 	if (res < 0)
 		die("%s", err.buf);
 	strbuf_release(&err);
@@ -2757,3 +2772,15 @@ int parse_opt_merge_filter(const struct option *opt, const char *arg, int unset)
 
 	return 0;
 }
+
+void clear_parsed_atom_list(struct list_head *parsed_atom_head)
+{
+	struct list_head *pos, *tmp;
+	struct parsed_atom_list *item;
+
+	list_for_each_safe(pos, tmp, parsed_atom_head) {
+		item = list_entry(pos, struct parsed_atom_list, list);
+		list_del(pos);
+		free(item);
+	}
+}
diff --git a/ref-filter.h b/ref-filter.h
index 5bceae1dac9..df54836a643 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -72,6 +72,13 @@ struct ref_filter {
 		verbose;
 };
 
+struct parsed_atom_list {
+	const char *beg;
+	const char *end;
+	int at;
+	struct list_head list;
+};
+
 struct ref_format {
 	/*
 	 * Set these to define the format; make sure you call
@@ -85,6 +92,7 @@ struct ref_format {
 	int can_skip_parse_buffer;
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
+	struct list_head parsed_atom_head;
 };
 
 #define REF_FORMAT_INIT { .use_color = -1, .can_skip_parse_buffer = 1 }
@@ -112,6 +120,9 @@ struct ref_format {
 int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type);
 /*  Clear all memory allocated to ref_array */
 void ref_array_clear(struct ref_array *array);
+/* Clear the parsed_atom_list in ref_format*/
+void clear_parsed_atom_list(struct list_head *parsed_atom_head);
+
 /*  Used to verify if the given format is correct and to parse out the used atoms */
 int verify_ref_format(struct ref_format *format);
 /*  Sort the given ref_array as per the ref_sorting provided */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] [GSOC] ref-filter: reuse final buffer
  2021-08-17  7:14 [PATCH 0/5] [GSOC] [RFC] ref-filter: performance optimization ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 1/5] [GSOC] ref-filter: skip parse_object_buffer in some cases ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 2/5] [GSOC] ref-filter: remove second parsing in format_ref_array_item ZheNing Hu via GitGitGadget
@ 2021-08-17  7:14 ` ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 4/5] [GSOC] ref-filter: reduce unnecessary object_info comparisons ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 5/5] [GSOC]: ref-filter: instead CALLOC_ARRAY to ALLOC_ARRAY ZheNing Hu via GitGitGadget
  4 siblings, 0 replies; 6+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-08-17  7:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In format_ref_array_item(), we add the object data
to ref_formatting_state, and copy the data from
ref_formatting_state to final_buf at the end. There
are huge copies of data.

Because final_buf will be cleared before every time
we call format_ref_array_item(). So we actually add
content to an empty strbuf. We can add the object's
data directly to this final_buffer instead of adding
objects' data to state.stack->output first, then
copy to final_buf.

Add a can_reuse_final_buffer flag to struct ref_format
and create can_reuse_final_buffer() to check if we are
use %(align), %(end), %(if), %(then), %(else). If not,
we can reuse the buf of finnal_buf.

Reuse the buffer address of final_buf in
format_ref_array_item(), we directly add the data to the
final buffer, return the content to final_buf at the end.

This will bring performance improvements.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 39 ++++++++++++++++++++++++++++++++++-----
 ref-filter.h |  3 ++-
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 76a31fb79b1..7106d4c1c4c 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1023,6 +1023,19 @@ static int need_parse_buffer(enum atom_type atom_type) {
 	}
 }
 
+static int can_reuse_final_buffer(enum atom_type atom_type) {
+	switch (atom_type) {
+	case ATOM_ALIGN:
+	case ATOM_END:
+	case ATOM_IF:
+	case ATOM_THEN:
+	case ATOM_ELSE:
+		return 0;
+	default:
+		return 1;
+	}
+}
+
 /*
  * Make sure the format string is well formed, and parse out
  * the used atoms.
@@ -1054,6 +1067,7 @@ int verify_ref_format(struct ref_format *format)
 			format->can_skip_parse_buffer = 0;
 		if (reject_atom(used_atom[at].atom_type))
 			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
+		format->can_reuse_final_buffer = can_reuse_final_buffer(used_atom[at].atom_type);
 
 		if ((format->quote_style == QUOTE_PYTHON ||
 		     format->quote_style == QUOTE_SHELL ||
@@ -2627,7 +2641,14 @@ int format_ref_array_item(struct ref_array_item *info,
 	struct ref_formatting_state state = REF_FORMATTING_STATE_INIT;
 
 	state.quote_style = format->quote_style;
-	push_stack_element(&state.stack);
+	if (format->can_reuse_final_buffer) {
+		struct ref_formatting_stack *s = xmalloc(sizeof(struct ref_formatting_stack));
+		s->output = *final_buf;
+		s->prev = state.stack;
+		state.stack = s;
+	} else {
+		push_stack_element(&state.stack);
+	}
 	info->can_skip_parse_buffer = format->can_skip_parse_buffer;
 
 	cp = format->format;
@@ -2641,7 +2662,8 @@ int format_ref_array_item(struct ref_array_item *info,
 			append_literal(cp, e->beg - 2, &state);
 		if (get_ref_atom_value(info, e->at, &atomv, error_buf) ||
 		    atomv->handler(atomv, &state, error_buf)) {
-			pop_stack_element(&state.stack);
+			if (!format->can_reuse_final_buffer)
+				pop_stack_element(&state.stack);
 			return -1;
 		}
 		cp = e->end + 1;
@@ -2656,16 +2678,23 @@ int format_ref_array_item(struct ref_array_item *info,
 		struct atom_value resetv = ATOM_VALUE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
-			pop_stack_element(&state.stack);
+			if (!format->can_reuse_final_buffer)
+				pop_stack_element(&state.stack);
 			return -1;
 		}
 	}
 	if (state.stack->prev) {
+		assert(!format->can_reuse_final_buffer);
 		pop_stack_element(&state.stack);
 		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
 	}
-	strbuf_addbuf(final_buf, &state.stack->output);
-	pop_stack_element(&state.stack);
+	if(format->can_reuse_final_buffer) {
+		*final_buf = state.stack->output;
+		free(state.stack);
+	} else {
+		strbuf_addbuf(final_buf, &state.stack->output);
+		pop_stack_element(&state.stack);
+	}
 	return 0;
 }
 
diff --git a/ref-filter.h b/ref-filter.h
index df54836a643..a62a14a2e43 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,10 +92,11 @@ struct ref_format {
 	int can_skip_parse_buffer;
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
+	int can_reuse_final_buffer;
 	struct list_head parsed_atom_head;
 };
 
-#define REF_FORMAT_INIT { .use_color = -1, .can_skip_parse_buffer = 1 }
+#define REF_FORMAT_INIT { .use_color = -1, .can_skip_parse_buffer = 1, .can_reuse_final_buffer = 1 }
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] [GSOC] ref-filter: reduce unnecessary object_info comparisons
  2021-08-17  7:14 [PATCH 0/5] [GSOC] [RFC] ref-filter: performance optimization ZheNing Hu via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-08-17  7:14 ` [PATCH 3/5] [GSOC] ref-filter: reuse final buffer ZheNing Hu via GitGitGadget
@ 2021-08-17  7:14 ` ZheNing Hu via GitGitGadget
  2021-08-17  7:14 ` [PATCH 5/5] [GSOC]: ref-filter: instead CALLOC_ARRAY to ALLOC_ARRAY ZheNing Hu via GitGitGadget
  4 siblings, 0 replies; 6+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-08-17  7:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In populate_value() we use an empty struct object_info
compare with oi and oi_deref, which determine if we can
return early, this is actually very inefficient.

So add a need_get_object_info flag in global variables,
which used to indicate whether we need call get_object()
or we can return early.

So we only need to compare only one byte instead of comparing
the entire huge object_info struct.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 7106d4c1c4c..35e221e7ec8 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -212,7 +212,7 @@ static struct used_atom {
 		char *head;
 	} u;
 } *used_atom;
-static int used_atom_cnt, need_tagged, need_symref;
+static int used_atom_cnt, need_tagged, need_symref, need_get_object_info;
 
 /*
  * Expand string, append it to strbuf *sb, then return error code ret.
@@ -318,10 +318,13 @@ static int objecttype_atom_parser(struct ref_format *format, struct used_atom *a
 {
 	if (arg)
 		return strbuf_addf_ret(err, -1, _("%%(objecttype) does not take arguments"));
-	if (*atom->name == '*')
+	if (*atom->name == '*') {
 		oi_deref.info.typep = &oi_deref.type;
-	else
+		need_get_object_info = 1;
+	} else {
 		oi.info.typep = &oi.type;
+		need_get_object_info = 1;
+	}
 	return 0;
 }
 
@@ -330,16 +333,23 @@ static int objectsize_atom_parser(struct ref_format *format, struct used_atom *a
 {
 	if (!arg) {
 		atom->u.objectsize.option = O_SIZE;
-		if (*atom->name == '*')
+		if (*atom->name == '*') {
 			oi_deref.info.sizep = &oi_deref.size;
-		else
+			need_get_object_info = 1;
+		} else {
 			oi.info.sizep = &oi.size;
+			need_get_object_info = 1;
+		}
 	} else if (!strcmp(arg, "disk")) {
 		atom->u.objectsize.option = O_SIZE_DISK;
-		if (*atom->name == '*')
+		if (*atom->name == '*') {
 			oi_deref.info.disk_sizep = &oi_deref.disk_size;
-		else
+			need_get_object_info = 1;
+		}
+		else {
 			oi.info.disk_sizep = &oi.disk_size;
+			need_get_object_info = 1;
+		}
 	} else
 		return strbuf_addf_ret(err, -1, _("unrecognized %%(objectsize) argument: %s"), arg);
 	return 0;
@@ -354,6 +364,7 @@ static int deltabase_atom_parser(struct ref_format *format, struct used_atom *at
 		oi_deref.info.delta_base_oid = &oi_deref.delta_base_oid;
 	else
 		oi.info.delta_base_oid = &oi.delta_base_oid;
+	need_get_object_info = 1;
 	return 0;
 }
 
@@ -720,6 +731,7 @@ static int parse_ref_filter_atom(struct ref_format *format,
 	used_atom[at].type = valid_atom[i].cmp_type;
 	used_atom[at].source = valid_atom[i].source;
 	if (used_atom[at].source == SOURCE_OBJ) {
+		need_get_object_info = 1;
 		if (*atom == '*')
 			oi_deref.info.contentp = &oi_deref.content;
 		else
@@ -1871,7 +1883,6 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 {
 	struct object *obj;
 	int i;
-	struct object_info empty = OBJECT_INFO_INIT;
 
 	CALLOC_ARRAY(ref->value, used_atom_cnt);
 
@@ -2019,10 +2030,11 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 					       oid_to_hex(&ref->objectname), ref->refname);
 	}
 
-	if (need_tagged)
+	if (need_tagged) {
 		oi.info.contentp = &oi.content;
-	if (!memcmp(&oi.info, &empty, sizeof(empty)) &&
-	    !memcmp(&oi_deref.info, &empty, sizeof(empty)))
+		need_get_object_info = 1;
+	}
+	if (!need_get_object_info)
 		return 0;
 
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] [GSOC]: ref-filter: instead CALLOC_ARRAY to ALLOC_ARRAY
  2021-08-17  7:14 [PATCH 0/5] [GSOC] [RFC] ref-filter: performance optimization ZheNing Hu via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-08-17  7:14 ` [PATCH 4/5] [GSOC] ref-filter: reduce unnecessary object_info comparisons ZheNing Hu via GitGitGadget
@ 2021-08-17  7:14 ` ZheNing Hu via GitGitGadget
  4 siblings, 0 replies; 6+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-08-17  7:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

populate_value() uses CALLOC_ARRAY() to allocate dynamic memory for
ref->value. But after that, we immediately assign values to its
members in the for loop. This shows that it is not necessary to
use CALLOC_ARRAY() for ref->value to clear the memory.

Therefore, use ALLOC_ARRAY() instead of CALLOC_ARRAY() to reduce the
overhead caused by clearing the memory. This can bring performance
optimizations.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index 35e221e7ec8..fe2df82067f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1884,7 +1884,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	struct object *obj;
 	int i;
 
-	CALLOC_ARRAY(ref->value, used_atom_cnt);
+	ALLOC_ARRAY(ref->value, used_atom_cnt);
 
 	if (need_symref && (ref->flag & REF_ISSYMREF) && !ref->symref) {
 		ref->symref = resolve_refdup(ref->refname, RESOLVE_REF_READING,
@@ -1903,6 +1903,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		const char *refname;
 		struct branch *branch = NULL;
 
+		v->s = NULL;
 		v->s_size = ATOM_SIZE_UNSPECIFIED;
 		v->handler = append_atom;
 		v->atom = atom;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-17  7:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17  7:14 [PATCH 0/5] [GSOC] [RFC] ref-filter: performance optimization ZheNing Hu via GitGitGadget
2021-08-17  7:14 ` [PATCH 1/5] [GSOC] ref-filter: skip parse_object_buffer in some cases ZheNing Hu via GitGitGadget
2021-08-17  7:14 ` [PATCH 2/5] [GSOC] ref-filter: remove second parsing in format_ref_array_item ZheNing Hu via GitGitGadget
2021-08-17  7:14 ` [PATCH 3/5] [GSOC] ref-filter: reuse final buffer ZheNing Hu via GitGitGadget
2021-08-17  7:14 ` [PATCH 4/5] [GSOC] ref-filter: reduce unnecessary object_info comparisons ZheNing Hu via GitGitGadget
2021-08-17  7:14 ` [PATCH 5/5] [GSOC]: ref-filter: instead CALLOC_ARRAY to ALLOC_ARRAY ZheNing Hu via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).