git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <dstolee@microsoft.com>
To: git@vger.kernel.org
Cc: peff@peff.net, git@jeffhostetler.com, stolee@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf()
Date: Fri,  1 Dec 2017 12:49:56 -0500	[thread overview]
Message-ID: <20171201174956.143245-1-dstolee@microsoft.com> (raw)

Replace use of strbuf_addf() with strbuf_add() when enumerating
loose objects in for_each_file_in_obj_subdir(). Since we already
check the length and hex-values of the string before consuming
the path, we can prevent extra computation by using the lower-
level method.

One consumer of for_each_file_in_obj_subdir() is the abbreviation
code. OID abbreviations use a cached list of loose objects (per
object subdirectory) to make repeated queries fast, but there is
significant cache load time when there are many loose objects.

Most repositories do not have many loose objects before repacking,
but in the GVFS case the repos can grow to have millions of loose
objects. Profiling 'git log' performance in GitForWindows on a
GVFS-enabled repo with ~2.5 million loose objects revealed 12% of
the CPU time was spent in strbuf_addf().

Add a new performance test to p4211-line-log.sh that is more
sensitive to this cache-loading. By limiting to 1000 commits, we
more closely resemble user wait time when reading history into a
pager.

For a copy of the Linux repo with two ~512 MB packfiles and ~572K
loose objects, running 'git log --oneline --raw --parents -1000'
had the following performance:

 HEAD~1            HEAD
----------------------------------------
 7.70(7.15+0.54)   7.44(7.09+0.29) -3.4%

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_file.c              | 7 ++++---
 t/perf/p4211-line-log.sh | 4 ++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 8ae6cb6285..2160323c4a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1914,17 +1914,18 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 	}
 
 	oid.hash[0] = subdir_nr;
+	strbuf_add(path, "/", 1);
+	baselen = path->len;
 
 	while ((de = readdir(dir))) {
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
-		strbuf_setlen(path, baselen);
-		strbuf_addf(path, "/%s", de->d_name);
-
 		if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 &&
 		    !hex_to_bytes(oid.hash + 1, de->d_name,
 				  GIT_SHA1_RAWSZ - 1)) {
+			strbuf_setlen(path, baselen);
+			strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2);
 			if (obj_cb) {
 				r = obj_cb(&oid, path->buf, data);
 				if (r)
diff --git a/t/perf/p4211-line-log.sh b/t/perf/p4211-line-log.sh
index e0ed05907c..392bcc0e51 100755
--- a/t/perf/p4211-line-log.sh
+++ b/t/perf/p4211-line-log.sh
@@ -35,4 +35,8 @@ test_perf 'git log --oneline --raw --parents' '
 	git log --oneline --raw --parents >/dev/null
 '
 
+test_perf 'git log --oneline --raw --parents -1000' '
+	git log --oneline --raw --parents -1000 >/dev/null
+'
+
 test_done
-- 
2.15.0


             reply	other threads:[~2017-12-01 17:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-01 17:49 Derrick Stolee [this message]
2017-12-01 18:22 ` [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf() Jeff King
2017-12-01 19:50   ` Derrick Stolee
2017-12-01 22:39     ` Jeff King
2017-12-04 14:06       ` [PATCH v2] " Derrick Stolee
2017-12-04 16:56         ` Jeff King
2017-12-04 17:12           ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171201174956.143245-1-dstolee@microsoft.com \
    --to=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).