git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Performance regression in `git branch` due to ref-filter usage
@ 2017-05-17 11:14 Michael Haggerty
  2017-05-17 14:04 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Haggerty @ 2017-05-17 11:14 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git discussion list

While working on reference code, I was running `git branch` under
`strace`, when I noticed that `$GIT_DIR/HEAD` was being `lstat()`ed and
`read()` 121 times. This is in a repository with 114 branches, so
probably it is being run once per branch. The extra work makes a
measurable difference to the (admittedly, short) runtime.

As recently as 2.12.3 the file was only read 4 times when running the
same command [1].

The regression bisects to

    949af0684c (branch: use ref-filter printing APIs, 2017-01-10)

It would be nice if these extra syscalls could be avoided.

I haven't checked whether other commands have similar regressions.

Michael

[1] One wonders why the file has to be read more than once, but that's a
different story and probably harder to fix.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Performance regression in `git branch` due to ref-filter usage
  2017-05-17 11:14 Performance regression in `git branch` due to ref-filter usage Michael Haggerty
@ 2017-05-17 14:04 ` Jeff King
  2017-05-19  6:12   ` [PATCH] ref-filter: resolve HEAD when parsing %(HEAD) atom Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2017-05-17 14:04 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Karthik Nayak, git discussion list

On Wed, May 17, 2017 at 01:14:34PM +0200, Michael Haggerty wrote:

> While working on reference code, I was running `git branch` under
> `strace`, when I noticed that `$GIT_DIR/HEAD` was being `lstat()`ed and
> `read()` 121 times. This is in a repository with 114 branches, so
> probably it is being run once per branch. The extra work makes a
> measurable difference to the (admittedly, short) runtime.
> 
> As recently as 2.12.3 the file was only read 4 times when running the
> same command [1].
> 
> The regression bisects to
> 
>     949af0684c (branch: use ref-filter printing APIs, 2017-01-10)
> 
> It would be nice if these extra syscalls could be avoided.
> 
> I haven't checked whether other commands have similar regressions.

It looks like it's part of populate_value(). Each ref checks %(HEAD),
and resolve HEAD individually to see if we're it. So it probably doesn't
affect other commands by default (though you could specify %(HEAD)
manually via for-each-ref).

The solution is to cache the value we read and use it to compare against
each ref. I'm not sure if we can do something more elegant than the
patch below, which just caches it for the length of the program.

> [1] One wonders why the file has to be read more than once, but that's a
> different story and probably harder to fix.

The other ones seem to come from wt_status code, as part of
get_head_description().

---
diff --git a/ref-filter.c b/ref-filter.c
index 1fc5e9970..947919fc4 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1284,6 +1284,20 @@ static const char *get_refname(struct used_atom *atom, struct ref_array_item *re
 	return show_ref(&atom->u.refname, ref->refname);
 }
 
+static int head_matches(const char *refname)
+{
+	static int initialized;
+	static char *head;
+
+	if (!initialized) {
+		unsigned char sha1[20];
+		head = resolve_refdup("HEAD", RESOLVE_REF_READING, sha1, NULL);
+		initialized = 1;
+	}
+
+	return head && !strcmp(refname, head);
+}
+
 /*
  * Parse the object referred by ref, and grab needed value.
  */
@@ -1369,12 +1383,7 @@ static void populate_value(struct ref_array_item *ref)
 		} else if (!deref && grab_objectname(name, ref->objectname, v, atom)) {
 			continue;
 		} else if (!strcmp(name, "HEAD")) {
-			const char *head;
-			unsigned char sha1[20];
-
-			head = resolve_ref_unsafe("HEAD", RESOLVE_REF_READING,
-						  sha1, NULL);
-			if (head && !strcmp(ref->refname, head))
+			if (head_matches(ref->refname))
 				v->s = "*";
 			else
 				v->s = " ";

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] ref-filter: resolve HEAD when parsing %(HEAD) atom
  2017-05-17 14:04 ` Jeff King
@ 2017-05-19  6:12   ` Jeff King
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff King @ 2017-05-19  6:12 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Karthik Nayak, git discussion list

If the user asks to display (or sort by) the %(HEAD) atom,
ref-filter has to compare each refname to the value of HEAD.
We do so by resolving HEAD fresh when calling populate_value()
on each ref. If there are a large number of refs, this can
have a measurable impact on runtime.

Instead, let's resolve HEAD once when we realize we need the
%(HEAD) atom, allowing us to do a simple string comparison
for each ref. On a repository with 3000 branches (high, but
an actual example found in the wild) this drops the
best-of-five time to run "git branch >/dev/null" from 59ms
to 48ms (~20% savings).

Signed-off-by: Jeff King <peff@peff.net>
---
The "something like this" patch I sent earlier just cached the value of
HEAD in a global for the length of the program. This is a bit nicer, in
that it ties the cache to the atom we are filling in. But since that's
also stored in a program global, the end effect is the same. :) I think
it's still worth doing it this way, though, as we might one day push the
used_atom stuff into some kind of ref_filter_context struct, and then
this would Just Work.

I did take a look at de-globalifying used_atom and friends, but it gets
pretty nasty pushing it all through the callstack. Since there's no
immediate benefit, I don't think it's really worth pursuing for now.

 ref-filter.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1fc5e9970..82ca411d0 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -93,6 +93,7 @@ static struct used_atom {
 			unsigned int length;
 		} objectname;
 		struct refname_atom refname;
+		char *head;
 	} u;
 } *used_atom;
 static int used_atom_cnt, need_tagged, need_symref;
@@ -287,6 +288,12 @@ static void if_atom_parser(struct used_atom *atom, const char *arg)
 	}
 }
 
+static void head_atom_parser(struct used_atom *atom, const char *arg)
+{
+	unsigned char unused[GIT_SHA1_RAWSZ];
+
+	atom->u.head = resolve_refdup("HEAD", RESOLVE_REF_READING, unused, NULL);
+}
 
 static struct {
 	const char *name;
@@ -325,7 +332,7 @@ static struct {
 	{ "push", FIELD_STR, remote_ref_atom_parser },
 	{ "symref", FIELD_STR, refname_atom_parser },
 	{ "flag" },
-	{ "HEAD" },
+	{ "HEAD", FIELD_STR, head_atom_parser },
 	{ "color", FIELD_STR, color_atom_parser },
 	{ "align", FIELD_STR, align_atom_parser },
 	{ "end" },
@@ -1369,12 +1376,7 @@ static void populate_value(struct ref_array_item *ref)
 		} else if (!deref && grab_objectname(name, ref->objectname, v, atom)) {
 			continue;
 		} else if (!strcmp(name, "HEAD")) {
-			const char *head;
-			unsigned char sha1[20];
-
-			head = resolve_ref_unsafe("HEAD", RESOLVE_REF_READING,
-						  sha1, NULL);
-			if (head && !strcmp(ref->refname, head))
+			if (atom->u.head && !strcmp(ref->refname, atom->u.head))
 				v->s = "*";
 			else
 				v->s = " ";
-- 
2.13.0.219.g63f6bc368


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-19  6:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-17 11:14 Performance regression in `git branch` due to ref-filter usage Michael Haggerty
2017-05-17 14:04 ` Jeff King
2017-05-19  6:12   ` [PATCH] ref-filter: resolve HEAD when parsing %(HEAD) atom Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).