git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Hariom Verma" <hariom18599@gmail.com>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Philip Oakley" <philipoakley@iee.email>,
	"ZheNing Hu" <adlternative@gmail.com>,
	"ZheNing Hu" <adlternative@gmail.com>
Subject: [PATCH 17/27] [GSOC] ref-filter: performance optimization by skip parse_object_buffer
Date: Fri, 13 Aug 2021 08:23:00 +0000	[thread overview]
Message-ID: <2c6ce95c8278710b43bf9b7a2fe09732a3172b09.1628842990.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1016.git.1628842990.gitgitgadget@gmail.com>

From: ZheNing Hu <adlternative@gmail.com>

When we are using some atoms such as %(raw), %(objectname),
%(objecttype), we don't need to parse the content of the object,
so we can skip parse_object_buffer() for performance optimization.

It is worth noting that in these cases, we still need to call
parse_object_buffer() for parsing:

1. The atom type is one of %(tag), %(type), %(object), %(tree),
%(numparent) or %(parent).
2. The type of the object is tag and the atom need to be
dereferenced e.g. %(*objecttype).

With this patch, `git cat-file --batch` has a 12.5% performance
improvement. But there is still a certain gap with
`git cat-file --batch` before using the logic of ref-filter.
`git cat-file --batch` is 10% worse, `git cat-file --batch-check`
is 30% worse.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 53 ++++++++++++++++++++++++++++++++++++++--------------
 ref-filter.h |  5 +++--
 2 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 8664d9a6f4e..1251e062ff8 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1025,6 +1025,20 @@ static int reject_atom(int cat_file_mode, enum atom_type atom_type)
 	}
 }
 
+static int need_parse_buffer(enum atom_type atom_type) {
+	switch (atom_type) {
+	case ATOM_TAG:
+	case ATOM_TYPE:
+	case ATOM_OBJECT:
+	case ATOM_TREE:
+	case ATOM_NUMPARENT:
+	case ATOM_PARENT:
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 /*
  * Make sure the format string is well formed, and parse out
  * the used atoms.
@@ -1045,6 +1059,8 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
+		if (need_parse_buffer(used_atom[at].atom_type))
+			format->can_skip_parse_buffer = 0;
 		if (reject_atom(format->cat_file_mode, used_atom[at].atom_type))
 			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
 
@@ -1532,14 +1548,16 @@ static void grab_values(struct atom_value *val, int deref, struct object *obj, s
 {
 	void *buf = data->content;
 
-	switch (obj->type) {
+	switch (data->type) {
 	case OBJ_TAG:
-		grab_tag_values(val, deref, obj);
+		if (obj)
+			grab_tag_values(val, deref, obj);
 		grab_sub_body_contents(val, deref, data);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
-		grab_commit_values(val, deref, obj);
+		if (obj)
+			grab_commit_values(val, deref, obj);
 		grab_sub_body_contents(val, deref, data);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
@@ -1792,16 +1810,23 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 				actual_oi = &act_oi;
 			}
 		}
-		*obj = parse_object_buffer(the_repository, &actual_oi->oid, actual_oi->type, actual_oi->size, actual_oi->content, &eaten);
-		if (!*obj) {
-			if (!eaten)
-				free(actual_oi->content);
-			if (actual_oi != oi)
-				free(oi->content);
-			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
-					       oid_to_hex(&oi->oid), ref->refname);
+		if (ref->can_skip_parse_buffer &&
+		    ((!deref &&
+		     (!need_tagged || oi->type != OBJ_TAG)) ||
+		    deref)) {
+			grab_values(ref->value, deref, NULL, actual_oi);
+		} else {
+			*obj = parse_object_buffer(the_repository, &actual_oi->oid, actual_oi->type, actual_oi->size, actual_oi->content, &eaten);
+			if (!*obj) {
+				if (!eaten)
+					free(actual_oi->content);
+				if (actual_oi != oi)
+					free(oi->content);
+				return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
+						       oid_to_hex(&oi->oid), ref->refname);
+			}
+			grab_values(ref->value, deref, *obj, actual_oi);
 		}
-		grab_values(ref->value, deref, *obj, actual_oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
@@ -2029,7 +2054,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	 * If there is no atom that wants to know about tagged
 	 * object, we are done.
 	 */
-	if (!need_tagged || (obj->type != OBJ_TAG))
+	if (!need_tagged || (oi.type != OBJ_TAG))
 		return 0;
 
 	/*
@@ -2651,7 +2676,7 @@ int format_ref_array_item(struct ref_array_item *info,
 
 	state.quote_style = format->quote_style;
 	push_stack_element(&state.stack);
-
+	info->can_skip_parse_buffer = format->can_skip_parse_buffer;
 	for (cp = format->format; *cp && (sp = find_next(cp)); cp = ep + 1) {
 		struct atom_value *atomv;
 		int pos;
diff --git a/ref-filter.h b/ref-filter.h
index 853eb554a5b..8cae9f576d5 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -41,6 +41,7 @@ struct ref_array_item {
 	const char *rest;
 	int cat_file_cmdmode;
 	int flag;
+	int can_skip_parse_buffer;
 	unsigned int kind;
 	const char *symref;
 	struct commit *commit;
@@ -83,12 +84,12 @@ struct ref_format {
 	int quote_style;
 	int use_rest;
 	int use_color;
-
+	int can_skip_parse_buffer;
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
 };
 
-#define REF_FORMAT_INIT { .use_color = -1 }
+#define REF_FORMAT_INIT { .use_color = -1, .can_skip_parse_buffer = 1 }
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
-- 
gitgitgadget


  parent reply	other threads:[~2021-08-13  8:25 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-13  8:22 [PATCH 00/27] [GSOC] [RFC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 01/27] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 02/27] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 03/27] [GSOC] ref-filter: --format=%(raw) support --perl ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 04/27] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 05/27] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 06/27] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 07/27] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 08/27] [GSOC] ref-filter: add cat_file_mode to ref_format ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 09/27] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 10/27] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 11/27] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 12/27] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 13/27] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 14/27] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 15/27] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-08-13  8:22 ` [PATCH 16/27] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` ZheNing Hu via GitGitGadget [this message]
2021-08-13  8:23 ` [PATCH 18/27] [GSOC] ref-filter: use atom_type and merge two for loop in grab_person ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 19/27] [GSOC] ref-filter: remove strlen from find_subpos ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 20/27] [GSOC] ref-filter: introducing xstrvfmt_len() and xstrfmt_len() ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 21/27] [GSOC] ref-filter: remove second parsing in format_ref_array_item ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 22/27] [GSOC] ref-filter: introduction ref_filter_slopbuf ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 23/27] [GSOC] ref-filter: add deref member to struct used_atom ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 24/27] [GSOC] ref-filter: introduce symref_atom_parser() ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 25/27] [GSOC] ref-filter: use switch case instread of if else ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 26/27] [GSOC] ref-filter: reuse finnal buffer if no stack need ZheNing Hu via GitGitGadget
2021-08-13  8:23 ` [PATCH 27/27] [GSOC] ref-filter: add need_get_object_info flag to struct expand_data ZheNing Hu via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c6ce95c8278710b43bf9b7a2fe09732a3172b09.1628842990.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    --cc=philipoakley@iee.email \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).