git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new
@ 2011-03-31  1:37 Dan McGee
  2011-03-31  1:37 ` [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Dan McGee @ 2011-03-31  1:37 UTC (permalink / raw)
  To: git; +Cc: Dan McGee

This was seen to happen in some invocations of git-log with a filtered
path. Only do it if we are not recursively descending, as otherwise we
mess with copy and rename detection in full tree moves.

We need to exclude times when FIND_COPIES_HARDER is enabled as the diff
machinery requires traversal of all files in this case to find copies
from files that did not change at all in the given revision.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---

These next few patches are all derived from trying to make operations on this
repository a bit faster:
    http://projects.archlinux.org/svntogit/packages.git/

As far as a different shape of repository, this one qualifies. Some stats from
after running `git gc --prune` and looking at things from count-objects among
others:

                    arch-packages   linux-2.6
in-pack:                   496718     1979274
size-pack:                 112582      513977
ls | wc -l:                  2401          34
ls-tree -r | wc -l          14039       36706
ls-tree -r -t | grep ' tree ' | wc -l
                            11211        2256
time git log -- zzzzz_not_exist >/dev/null
                          35.558s      0.976s

I didn't find some golden switch to turn that made things instantly fast, but
these patches did at least give some speedups. Suggestions/feedback/etc.
welcome. Most profiling was done with `valgrind --tool=callgrind` and then
using kcachegrind to chase down slow spots.

 tree-diff.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 76f83fc..ab90f1a 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -286,6 +286,9 @@ int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const cha
 	unsigned long size1, size2;
 	int retval;
 
+	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(old, new))
+		return 0;
+
 	tree1 = read_object_with_reference(old, tree_type, &size1, NULL);
 	if (!tree1)
 		die("unable to read source tree (%s)", sha1_to_hex(old));
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix
  2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
@ 2011-03-31  1:37 ` Dan McGee
  2011-08-30 18:55   ` Dan McGee
  2011-03-31  1:37 ` [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Dan McGee @ 2011-03-31  1:37 UTC (permalink / raw)
  To: git; +Cc: Dan McGee

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 tree-walk.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 322becc..9be8007 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -522,7 +522,7 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 	return 0;
 }
 
-static int match_dir_prefix(const char *base, int baselen,
+static int match_dir_prefix(const char *base,
 			    const char *match, int matchlen)
 {
 	if (strncmp(base, match, matchlen))
@@ -579,7 +579,7 @@ int tree_entry_interesting(const struct name_entry *entry,
 
 		if (baselen >= matchlen) {
 			/* If it doesn't match, move along... */
-			if (!match_dir_prefix(base_str, baselen, match, matchlen))
+			if (!match_dir_prefix(base_str, match, matchlen))
 				goto match_wildcards;
 
 			if (!ps->recursive || ps->max_depth == -1)
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting
  2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
  2011-03-31  1:37 ` [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
@ 2011-03-31  1:37 ` Dan McGee
  2011-04-03  4:01   ` Nguyen Thai Ngoc Duy
                     ` (2 more replies)
  2011-03-31  1:38 ` [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Dan McGee
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 30+ messages in thread
From: Dan McGee @ 2011-03-31  1:37 UTC (permalink / raw)
  To: git; +Cc: Dan McGee

In the case of a wide breadth top-level tree (~2400 entries, all trees
in this case), we can see a noticeable cost in the profiler calling
strncmp() here. Most of the time we are at the base level of the
repository, so base is "" and baselen == 0, which means we will always
test true. Break out this one tiny case so we can short circuit the
strncmp() call.

This resulted in an ~11% improvement (43 to 38 secs) for a reasonable
log operation on the Arch Linux Packages SVN clone repository, which
contained 117220 commits and the aforementioned 2400 top-level objects:
    git log -- autogen/trunk pacman/trunk/ wget/trunk/

Negligible slowdown was noted with other repositories (e.g. linux-2.6).

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 tree-walk.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 9be8007..f386151 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -591,8 +591,8 @@ int tree_entry_interesting(const struct name_entry *entry,
 					      ps->max_depth);
 		}
 
-		/* Does the base match? */
-		if (!strncmp(base_str, match, baselen)) {
+		/* Either there must be no base, or the base must match. */
+		if (baselen == 0 || !strncmp(base_str, match, baselen)) {
 			if (match_entry(entry, pathlen,
 					match + baselen, matchlen - baselen,
 					&never_interesting))
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
  2011-03-31  1:37 ` [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
  2011-03-31  1:37 ` [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
@ 2011-03-31  1:38 ` Dan McGee
  2011-04-02  9:28   ` Nguyen Thai Ngoc Duy
                     ` (2 more replies)
  2011-03-31  1:38 ` [PATCH 5/5] tree-walk: match_entry microoptimization Dan McGee
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 30+ messages in thread
From: Dan McGee @ 2011-03-31  1:38 UTC (permalink / raw)
  To: git; +Cc: Dan McGee

We know our mode entry in our tree objects should be 5 or 6 characters
long. This change both enforces this fact and also unrolls the parsing
of the information giving the compiler more room for optimization of the
operations.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 tree-walk.c |   41 ++++++++++++++++++++++++++++++++++-------
 1 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index f386151..41383b0 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -9,16 +9,43 @@ static const char *get_mode(const char *str, unsigned int *modep)
 	unsigned char c;
 	unsigned int mode = 0;
 
-	if (*str == ' ')
-		return NULL;
-
-	while ((c = *str++) != ' ') {
-		if (c < '0' || c > '7')
-			return NULL;
+	/*
+	 * Unroll what looks like a loop since the bounds are
+	 * well-known. There should be at least 5 and at most 6
+	 * characters available in any valid mode, as '40000' is the
+	 * shortest while '160000' (S_IFGITLINK) is the longest.
+	 */
+	/* char 1 */
+	c = *str++;
+	if (c < '0' || c > '7') return NULL;
+	mode = (mode << 3) + (c - '0');
+	/* char 2 */
+	c = *str++;
+	if (c < '0' || c > '7') return NULL;
+	mode = (mode << 3) + (c - '0');
+	/* char 3 */
+	c = *str++;
+	if (c < '0' || c > '7') return NULL;
+	mode = (mode << 3) + (c - '0');
+	/* char 4 */
+	c = *str++;
+	if (c < '0' || c > '7') return NULL;
+	mode = (mode << 3) + (c - '0');
+	/* char 5 */
+	c = *str++;
+	if (c < '0' || c > '7') return NULL;
+	mode = (mode << 3) + (c - '0');
+	/* char 6, optional */
+	if (*str != ' ') {
+		c = *str++;
+		if (c < '0' || c > '7') return NULL;
 		mode = (mode << 3) + (c - '0');
 	}
+
+	if (*str != ' ') return NULL;
+
 	*modep = mode;
-	return str;
+	return str + 1;
 }
 
 static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size)
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] tree-walk: match_entry microoptimization
  2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
                   ` (2 preceding siblings ...)
  2011-03-31  1:38 ` [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Dan McGee
@ 2011-03-31  1:38 ` Dan McGee
  2011-04-02  9:06   ` Nguyen Thai Ngoc Duy
  2011-03-31 12:58 ` [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Nguyen Thai Ngoc Duy
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Dan McGee @ 2011-03-31  1:38 UTC (permalink / raw)
  To: git; +Cc: Dan McGee

Before calling strncmp(), see if we can get away with checking only the
first character of the passed path components instead.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 tree-walk.c |   26 ++++++++++++++++++--------
 1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 41383b0..083b951 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -488,9 +488,10 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 		       const char *match, int matchlen,
 		       int *never_interesting)
 {
-	int m = -1; /* signals that we haven't called strncmp() */
+	int m = -1; /* signals that we haven't compared strings */
 
 	if (*never_interesting) {
+		const int maxlen = (matchlen < pathlen) ? matchlen : pathlen;
 		/*
 		 * We have not seen any match that sorts later
 		 * than the current path.
@@ -500,10 +501,13 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 		 * Does match sort strictly earlier than path
 		 * with their common parts?
 		 */
-		m = strncmp(match, entry->path,
-			    (matchlen < pathlen) ? matchlen : pathlen);
-		if (m < 0)
-			return 0;
+		if (maxlen && match[0] > entry->path[0]) {
+			/* no good for the shortcut here, match must be <= */
+		} else {
+			m = strncmp(match, entry->path, maxlen);
+			if(m < 0)
+				return 0;
+		}
 
 		/*
 		 * If we come here even once, that means there is at
@@ -531,12 +535,17 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 			return 0;
 	}
 
-	if (m == -1)
+	if (m == -1) {
 		/*
-		 * we cheated and did not do strncmp(), so we do
+		 * we cheated and did compare strings, so we do
 		 * that here.
 		 */
-		m = strncmp(match, entry->path, pathlen);
+		if (pathlen && match[0] == entry->path[0])
+			/* invariant: matchlen == pathlen */
+			m = strncmp(match, entry->path, pathlen);
+		else
+			m = 1;
+	}
 
 	/*
 	 * If common part matched earlier then it is a hit,
@@ -552,6 +561,7 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 static int match_dir_prefix(const char *base,
 			    const char *match, int matchlen)
 {
+	/* invariant: baselen >= matchlen */
 	if (strncmp(base, match, matchlen))
 		return 0;
 
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new
  2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
                   ` (3 preceding siblings ...)
  2011-03-31  1:38 ` [PATCH 5/5] tree-walk: match_entry microoptimization Dan McGee
@ 2011-03-31 12:58 ` Nguyen Thai Ngoc Duy
  2011-03-31 13:56   ` Dan McGee
  2011-04-01 22:28 ` Junio C Hamano
  2011-05-03  7:34 ` Nguyen Thai Ngoc Duy
  6 siblings, 1 reply; 30+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-03-31 12:58 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

On Thu, Mar 31, 2011 at 8:37 AM, Dan McGee <dpmcgee@gmail.com> wrote:
> These next few patches are all derived from trying to make operations on this
> repository a bit faster:
>    http://projects.archlinux.org/svntogit/packages.git/
>
> ...
>
> time git log -- zzzzz_not_exist >/dev/null
>                          35.558s      0.976s

Do you have numbers before and after applying this series? It'd be
interesting to see how much we gain from the series.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new
  2011-03-31 12:58 ` [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Nguyen Thai Ngoc Duy
@ 2011-03-31 13:56   ` Dan McGee
  0 siblings, 0 replies; 30+ messages in thread
From: Dan McGee @ 2011-03-31 13:56 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

On Thu, Mar 31, 2011 at 7:58 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Thu, Mar 31, 2011 at 8:37 AM, Dan McGee <dpmcgee@gmail.com> wrote:
>> These next few patches are all derived from trying to make operations on this
>> repository a bit faster:
>>    http://projects.archlinux.org/svntogit/packages.git/
>>
>> ...
>>
>> time git log -- zzzzz_not_exist >/dev/null
>>                          35.558s      0.976s
>
> Do you have numbers before and after applying this series? It'd be
> interesting to see how much we gain from the series.

Whipped some up for you:

Repo            Operation                           master   after    delta
arch-packages   ../git/git-log                      2.26     2.27     -0.01
arch-packages   ../git/git-log -- zzzzz_not_exist   34.69    26.75    7.93
arch-packages   ../git/git-log -- aaaaa_first       6.02     5.76     0.26
linux-2.6       ../git/git-log                      5.51     5.50     0.02
linux-2.6       ../git/git-log -- zzzzz_not_exist   0.95     0.92     0.03
linux-2.6       ../git/git-log -- aaaaa_first       0.89     0.86     0.03

These are all the "real" value from time with the full set of patches
applied. As you can see, one either gains or isn't really affected by
these; anything under 0.05s delta is noise.

Bigger gains than this don't seem too plausible unless we move away
from linear parsing and traversal of tree objects.

-Dan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new
  2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
                   ` (4 preceding siblings ...)
  2011-03-31 12:58 ` [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Nguyen Thai Ngoc Duy
@ 2011-04-01 22:28 ` Junio C Hamano
       [not found]   ` <AANLkTinPSqDPdGi5nA3sH1D2wMSW1SQc+5gRqdLy++y0@mail.gmail.com>
  2011-05-03  7:34 ` Nguyen Thai Ngoc Duy
  6 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2011-04-01 22:28 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

Dan McGee <dpmcgee@gmail.com> writes:

> This was seen to happen in some invocations of git-log with a filtered
> path. Only do it if we are not recursively descending, as otherwise we
> mess with copy and rename detection in full tree moves.

There is no code that corresponds to your "Only do it..." description in
your patch, though.  The existing code already takes care of that part
with or without your patch, no?

> diff --git a/tree-diff.c b/tree-diff.c
> index 76f83fc..ab90f1a 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -286,6 +286,9 @@ int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const cha
>  	unsigned long size1, size2;
>  	int retval;
>  
> +	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(old, new))
> +		return 0;
> +

I am very curious why this patch makes a difference; doesn't an existing
test in compare_tree_entry() oalready cull extra recursion?  There is:

	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) &&
		mode1 == mode2)
		return 0;

before a recursive call to diff_tree_sha1() to dig deeper.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] tree-walk: match_entry microoptimization
  2011-03-31  1:38 ` [PATCH 5/5] tree-walk: match_entry microoptimization Dan McGee
@ 2011-04-02  9:06   ` Nguyen Thai Ngoc Duy
  2011-04-02 17:54     ` Dan McGee
  0 siblings, 1 reply; 30+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-04-02  9:06 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee <dpmcgee@gmail.com> wrote:
> Before calling strncmp(), see if we can get away with checking only the
> first character of the passed path components instead.
>
> Signed-off-by: Dan McGee <dpmcgee@gmail.com>

I wonder if inlining strcmp() would be better (at least cleaner code).
It seems like you try to avoid the function call cost here.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-03-31  1:38 ` [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Dan McGee
@ 2011-04-02  9:28   ` Nguyen Thai Ngoc Duy
  2011-04-02 17:28     ` Dan McGee
  2011-04-04 10:29   ` Erik Faye-Lund
  2011-04-04 16:55   ` Junio C Hamano
  2 siblings, 1 reply; 30+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-04-02  9:28 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee <dpmcgee@gmail.com> wrote:
> We know our mode entry in our tree objects should be 5 or 6 characters
> long. This change both enforces this fact and also unrolls the parsing
> of the information giving the compiler more room for optimization of the
> operations.

I'm skeptical. Did you measure signficant gain after this patch? I
looked at asm output with -O3 and failed to see the compiler doing
anything fancy. Perhaps it's because I'm on x86 with quite small
register set.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-04-02  9:28   ` Nguyen Thai Ngoc Duy
@ 2011-04-02 17:28     ` Dan McGee
  2011-04-03  4:07       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 30+ messages in thread
From: Dan McGee @ 2011-04-02 17:28 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

On Sat, Apr 2, 2011 at 4:28 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee <dpmcgee@gmail.com> wrote:
>> We know our mode entry in our tree objects should be 5 or 6 characters
>> long. This change both enforces this fact and also unrolls the parsing
>> of the information giving the compiler more room for optimization of the
>> operations.
>
> I'm skeptical. Did you measure signficant gain after this patch? I
> looked at asm output with -O3 and failed to see the compiler doing
> anything fancy. Perhaps it's because I'm on x86 with quite small
> register set.

I'm on x86_64 and was just using -O2; -O3 produces the same output
actually. You can see it below. I had taken a look at this before I
submitted, and noticed a few things:
1. We do use multiple registers now since we aren't constrained to a loop.
2. movzbl (for the string parts) and cmb instructions tend to get
clustered first.
3. mozbl (for the mode shifting) and leal instructions tend to get
clustered later.
4. The normal case now involves no conditional jumps until the ' '
(space) comparison.

Call these "trivial", but on my worst case operation times went from
(shown below) 27.41 secs to 26.49 secs. Considering this operation is
called 530,588,868 times (that is not a typo) during this operation,
every saved instruction or non-missed branch prediction does seem to
make a difference.

-Dan

Repo:
http://projects.archlinux.org/svntogit/packages.git/

Old:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null

real	0m27.409s
user	0m27.172s
sys	0m0.230s

.LVL3:
.LBB58:
.LBB59:
	.loc 1 12 0 is_stmt 1
	movzbl	(%rsi), %eax
	cmpb	$32, %al
	je	.L5
.LVL4:
	.loc 1 16 0
	leal	-48(%rax), %edx
.LVL5:
	.loc 1 15 0
	leaq	1(%rsi), %rdi
.LVL6:
	.loc 1 16 0
	cmpb	$7, %dl
	ja	.L5
	xorl	%edx, %edx
	jmp	.L6
.LVL7:
	.p2align 4,,10
	.p2align 3
.L7:
	leal	-48(%rax), %ecx
	cmpb	$7, %cl
	ja	.L5
.LVL8:
.L6:
	.loc 1 18 0
	movzbl	%al, %eax
	leal	-48(%rax,%rdx,8), %edx
.LVL9:
	.loc 1 15 0
	movzbl	(%rdi), %eax
.LVL10:
	addq	$1, %rdi
.LVL11:
	cmpb	$32, %al
	jne	.L7


New:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null

real	0m26.490s
user	0m26.282s
sys	0m0.200s

.LVL3:
.LBB58:
.LBB59:
	.loc 1 19 0 is_stmt 1
	movzbl	(%rsi), %eax
.LVL4:
	.loc 1 20 0
	leal	-48(%rax), %edx
.LVL5:
	cmpb	$7, %dl
	ja	.L5
.LVL6:
	.loc 1 23 0
	movzbl	1(%rsi), %edx
.LVL7:
	.loc 1 24 0
	leal	-48(%rdx), %ecx
	cmpb	$7, %cl
	ja	.L5
.LVL8:
	.loc 1 27 0
	movzbl	2(%rsi), %ecx
.LVL9:
	.loc 1 28 0
	leal	-48(%rcx), %edi
	cmpb	$7, %dil
	ja	.L5
.LVL10:
	.loc 1 31 0
	movzbl	3(%rsi), %edi
.LVL11:
	.loc 1 32 0
	leal	-48(%rdi), %r8d
	cmpb	$7, %r8b
	ja	.L5
.LVL12:
	.loc 1 35 0
	movzbl	4(%rsi), %r8d
.LVL13:
	.loc 1 36 0
	leal	-48(%r8), %r9d
	cmpb	$7, %r9b
	ja	.L5
	.loc 1 21 0
	movzbl	%al, %eax
	.loc 1 25 0
	movzbl	%dl, %edx
	.loc 1 29 0
	movzbl	%cl, %ecx
	.loc 1 25 0
	leal	-432(%rdx,%rax,8), %edx
	.loc 1 33 0
	movzbl	%dil, %edi
	.loc 1 37 0
	movzbl	%r8b, %r8d
	.loc 1 35 0
	leaq	5(%rsi), %rax
	.loc 1 29 0
	leal	-48(%rcx,%rdx,8), %edx
	.loc 1 39 0
	movzbl	5(%rsi), %ecx
	.loc 1 33 0
	leal	-48(%rdi,%rdx,8), %edx
	.loc 1 39 0
	cmpb	$32, %cl
	.loc 1 37 0
	leal	-48(%r8,%rdx,8), %edx
.LVL14:
	.loc 1 39 0
	je	.L7
.LVL15:
	.loc 1 41 0
	leal	-48(%rcx), %eax
	cmpb	$7, %al
	ja	.L5
	.loc 1 45 0
	cmpb	$32, 6(%rsi)
	.loc 1 42 0
	movzbl	%cl, %ecx
	.loc 1 40 0
	leaq	6(%rsi), %rax
	.loc 1 42 0
	leal	-48(%rcx,%rdx,8), %edx
.LVL16:
	.loc 1 45 0
	jne	.L5

diff --git a/tree-walk.c b/tree-walk.c
index 63901f8..dd7bd45 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -4,11 +4,22 @@
 #include "dir.h"
 #include "tree.h"

+static unsigned long hit_ctr = 0;
+
+static void print_hit_ctr(void)
+{
+       fprintf(stderr, "hit_ctr: %lu\n", hit_ctr);
+}
+
 static const char *get_mode(const char *str, unsigned int *modep)
 {
        unsigned char c;
        unsigned int mode = 0;

+       if(hit_ctr == 0) {
+               atexit(print_hit_ctr);
+       }
+       hit_ctr++;
        /*
         * Unroll what looks like a loop since the bounds are
         * well-known. There should be at least 5 and at most 6

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] tree-walk: match_entry microoptimization
  2011-04-02  9:06   ` Nguyen Thai Ngoc Duy
@ 2011-04-02 17:54     ` Dan McGee
  0 siblings, 0 replies; 30+ messages in thread
From: Dan McGee @ 2011-04-02 17:54 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

On Sat, Apr 2, 2011 at 4:06 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee <dpmcgee@gmail.com> wrote:
>> Before calling strncmp(), see if we can get away with checking only the
>> first character of the passed path components instead.
>>
>> Signed-off-by: Dan McGee <dpmcgee@gmail.com>
>
> I wonder if inlining strcmp() would be better (at least cleaner code).
> It seems like you try to avoid the function call cost here.

100% agree with you, but I have no idea how to do that. Realize also
you can use memcmp() here; I tried that and got no noticeable gain
(likely due to how short these strings are). However, it is what we
use in most other functions in this file. I assume the compiler would
inline versions with a static length, but if you have suggestions as
to how to inline a non-trivial version here I'm all ears.

-Dan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Fwd: [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new
       [not found]   ` <AANLkTinPSqDPdGi5nA3sH1D2wMSW1SQc+5gRqdLy++y0@mail.gmail.com>
@ 2011-04-02 18:38     ` Dan McGee
  0 siblings, 0 replies; 30+ messages in thread
From: Dan McGee @ 2011-04-02 18:38 UTC (permalink / raw)
  To: git

Forgot to forward this to the list as well, I apologize.

On Fri, Apr 1, 2011 at 5:28 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Dan McGee <dpmcgee@gmail.com> writes:
>
>> This was seen to happen in some invocations of git-log with a filtered
>> path. Only do it if we are not recursively descending, as otherwise we
>> mess with copy and rename detection in full tree moves.
>
> There is no code that corresponds to your "Only do it..." description in
> your patch, though.  The existing code already takes care of that part
> with or without your patch, no?

Damn, I forgot to update the message- see below.

>> diff --git a/tree-diff.c b/tree-diff.c
>> index 76f83fc..ab90f1a 100644
>> --- a/tree-diff.c
>> +++ b/tree-diff.c
>> @@ -286,6 +286,9 @@ int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const cha
>>       unsigned long size1, size2;
>>       int retval;
>>
>> +     if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(old, new))
>> +             return 0;
>> +
>
> I am very curious why this patch makes a difference; doesn't an existing
> test in compare_tree_entry() oalready cull extra recursion?  There is:

This was originally testing RECURSIVE; however I discovered that was
not the culprit to my failed tests.

t9300-fastimport.sh was failing on "copy then modify subdirectory" due
to the full info not being loaded for the before sha1 in that test-
instead of showing the fcf778cda ... C100 part (this is just the first
line of expected, all were the same), it was 000000 ... A. once I
added the above fallthrough to not shortcut if this option was
enabled, things worked fine and all tests passed.

>        if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) &&
>                mode1 == mode2)
>                return 0;
>
> before a recursive call to diff_tree_sha1() to dig deeper.
>

I'm not totally sure why this check wasn't working, but without the
above exception my patch definitely broke tests.

-Dan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting
  2011-03-31  1:37 ` [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
@ 2011-04-03  4:01   ` Nguyen Thai Ngoc Duy
  2011-04-03 18:55   ` Junio C Hamano
  2011-04-04 14:46   ` [PATCH] tree_entry_interesting: inline strncmp() Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 30+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-04-03  4:01 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

On Wed, Mar 30, 2011 at 08:37:59PM -0500, Dan McGee wrote:
> In the case of a wide breadth top-level tree (~2400 entries, all trees
> in this case), we can see a noticeable cost in the profiler calling
> strncmp() here. Most of the time we are at the base level of the
> repository, so base is "" and baselen == 0, which means we will always
> test true. Break out this one tiny case so we can short circuit the
> strncmp() call.
> 
> This resulted in an ~11% improvement (43 to 38 secs) for a reasonable
> log operation on the Arch Linux Packages SVN clone repository, which
> contained 117220 commits and the aforementioned 2400 top-level objects:
>     git log -- autogen/trunk pacman/trunk/ wget/trunk/
> 
> Negligible slowdown was noted with other repositories (e.g. linux-2.6).
> 
> Signed-off-by: Dan McGee <dpmcgee@gmail.com>

Ack.

On my (slower) laptop, the same command with vanilla git -O3 takes
95 secs. Your patch cuts it down to 82 secs.

I tried inlining strncmp with a version from glibc. Without your patch
it takes 88 secs. With your patch, it takes 86 secs (still worse than
your patch alone). A better strncmp version must be used on my system,
I suspect.

--8<--
diff --git a/tree-walk.c b/tree-walk.c
index 322becc..0859b5c 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -457,6 +457,55 @@ int get_tree_entry(const unsigned char *tree_sha1, const char *name, unsigned ch
 	return retval;
 }
 
+static inline int inline_strncmp(const char *s1, const char *s2, size_t n)
+{
+	unsigned char c1 = '\0';
+	unsigned char c2 = '\0';
+
+	if (n == 0)
+		return 0;
+
+	if (n >= 4) {
+		size_t n4 = n >> 2;
+		do {
+			c1 = (unsigned char) *s1++;
+			c2 = (unsigned char) *s2++;
+			if (c1 == '\0' || c1 != c2)
+				return c1 - c2;
+			c1 = (unsigned char) *s1++;
+			c2 = (unsigned char) *s2++;
+			if (c1 == '\0' || c1 != c2)
+				return c1 - c2;
+			c1 = (unsigned char) *s1++;
+			c2 = (unsigned char) *s2++;
+			if (c1 == '\0' || c1 != c2)
+				return c1 - c2;
+			c1 = (unsigned char) *s1++;
+			c2 = (unsigned char) *s2++;
+			if (c1 == '\0' || c1 != c2)
+				return c1 - c2;
+		} while (--n4 > 0);
+		n &= 3;
+	}
+
+	while (n > 0) {
+		c1 = (unsigned char) *s1++;
+		c2 = (unsigned char) *s2++;
+		if (c1 == '\0' || c1 != c2)
+			return c1 - c2;
+		n--;
+	}
+
+	return c1 - c2;
+}
+
+#if 1
+#ifdef strncmp
+#undef strncmp
+#endif
+#define strncmp inline_strncmp
+#endif
+
 static int match_entry(const struct name_entry *entry, int pathlen,
 		       const char *match, int matchlen,
 		       int *never_interesting)
--8<--
-- 
Duy

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-04-02 17:28     ` Dan McGee
@ 2011-04-03  4:07       ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 30+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-04-03  4:07 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

On Sun, Apr 3, 2011 at 12:28 AM, Dan McGee <dpmcgee@gmail.com> wrote:
> On Sat, Apr 2, 2011 at 4:28 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>> On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee <dpmcgee@gmail.com> wrote:
>>> We know our mode entry in our tree objects should be 5 or 6 characters
>>> long. This change both enforces this fact and also unrolls the parsing
>>> of the information giving the compiler more room for optimization of the
>>> operations.
>>
>> I'm skeptical. Did you measure signficant gain after this patch? I
>> looked at asm output with -O3 and failed to see the compiler doing
>> anything fancy. Perhaps it's because I'm on x86 with quite small
>> register set.
>
> I'm on x86_64 and was just using -O2; -O3 produces the same output
> actually. You can see it below. I had taken a look at this before I
> submitted, and noticed a few things:
> 1. We do use multiple registers now since we aren't constrained to a loop.
> 2. movzbl (for the string parts) and cmb instructions tend to get
> clustered first.
> 3. mozbl (for the mode shifting) and leal instructions tend to get
> clustered later.
> 4. The normal case now involves no conditional jumps until the ' '
> (space) comparison.
>
> Call these "trivial", but on my worst case operation times went from
> (shown below) 27.41 secs to 26.49 secs. Considering this operation is
> called 530,588,868 times (that is not a typo) during this operation,
> every saved instruction or non-missed branch prediction does seem to
> make a difference.

If it makes it better for you, I'm good.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting
  2011-03-31  1:37 ` [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
  2011-04-03  4:01   ` Nguyen Thai Ngoc Duy
@ 2011-04-03 18:55   ` Junio C Hamano
  2011-04-05  0:22     ` Dan McGee
  2011-04-04 14:46   ` [PATCH] tree_entry_interesting: inline strncmp() Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2011-04-03 18:55 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

Dan McGee <dpmcgee@gmail.com> writes:

> In the case of a wide breadth top-level tree (~2400 entries, all trees
> in this case), we can see a noticeable cost in the profiler calling
> strncmp() here. Most of the time we are at the base level of the
> repository, so base is "" and baselen == 0, which means we will always
> test true. Break out this one tiny case so we can short circuit the
> strncmp() call.

This sounds as if the patch helps only when you have a superfat tree at
the "top-level" of the project, but wouldn't this benefit any superfat
tree at _any_ level while we recursively descend into it?

> This resulted in an ~11% improvement (43 to 38 secs) for a reasonable
> log operation on the Arch Linux Packages SVN clone repository, which
> contained 117220 commits and the aforementioned 2400 top-level objects:
>     git log -- autogen/trunk pacman/trunk/ wget/trunk/
>
> Negligible slowdown was noted with other repositories (e.g. linux-2.6).

It would have been easier to swallow if the last sentence were "This could
lead to a slowdown in repositories without directories that are too wide,
but in practice it was not even measurable."  "Negligible" sounds as if it
had still measurable downside, and as if you decided that the slowdown can
be ignored---but obviously you are not an unbiased judge.

There is nothing wrong in the patch per-se, but I really wish we didn't
have to do this; it feels like the compiler should be helping us in this
case.

> Signed-off-by: Dan McGee <dpmcgee@gmail.com>
> ---
>  tree-walk.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tree-walk.c b/tree-walk.c
> index 9be8007..f386151 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -591,8 +591,8 @@ int tree_entry_interesting(const struct name_entry *entry,
>  					      ps->max_depth);
>  		}
>  
> -		/* Does the base match? */
> -		if (!strncmp(base_str, match, baselen)) {
> +		/* Either there must be no base, or the base must match. */
> +		if (baselen == 0 || !strncmp(base_str, match, baselen)) {
>  			if (match_entry(entry, pathlen,
>  					match + baselen, matchlen - baselen,
>  					&never_interesting))

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-03-31  1:38 ` [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Dan McGee
  2011-04-02  9:28   ` Nguyen Thai Ngoc Duy
@ 2011-04-04 10:29   ` Erik Faye-Lund
  2011-04-04 12:30     ` Andreas Ericsson
  2011-04-04 16:55   ` Junio C Hamano
  2 siblings, 1 reply; 30+ messages in thread
From: Erik Faye-Lund @ 2011-04-04 10:29 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

On Thu, Mar 31, 2011 at 3:38 AM, Dan McGee <dpmcgee@gmail.com> wrote:
> We know our mode entry in our tree objects should be 5 or 6 characters
> long. This change both enforces this fact and also unrolls the parsing
> of the information giving the compiler more room for optimization of the
> operations.
>
> Signed-off-by: Dan McGee <dpmcgee@gmail.com>
> ---
>  tree-walk.c |   41 ++++++++++++++++++++++++++++++++++-------
>  1 files changed, 34 insertions(+), 7 deletions(-)
>
> diff --git a/tree-walk.c b/tree-walk.c
> index f386151..41383b0 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -9,16 +9,43 @@ static const char *get_mode(const char *str, unsigned int *modep)
>        unsigned char c;
>        unsigned int mode = 0;
>
> -       if (*str == ' ')
> -               return NULL;
> -
> -       while ((c = *str++) != ' ') {
> -               if (c < '0' || c > '7')
> -                       return NULL;
> +       /*
> +        * Unroll what looks like a loop since the bounds are
> +        * well-known. There should be at least 5 and at most 6
> +        * characters available in any valid mode, as '40000' is the
> +        * shortest while '160000' (S_IFGITLINK) is the longest.
> +        */
> +       /* char 1 */
> +       c = *str++;
> +       if (c < '0' || c > '7') return NULL;

We perfer this style:

if (c < '0' || c > '7')
	return NULL;

i.e a line-break and a tab between the if-statement and the conditional code.

> +       mode = (mode << 3) + (c - '0');
> +       /* char 2 */
> +       c = *str++;
> +       if (c < '0' || c > '7') return NULL;
> +       mode = (mode << 3) + (c - '0');
> +       /* char 3 */
> +       c = *str++;
> +       if (c < '0' || c > '7') return NULL;
> +       mode = (mode << 3) + (c - '0');
> +       /* char 4 */
> +       c = *str++;
> +       if (c < '0' || c > '7') return NULL;
> +       mode = (mode << 3) + (c - '0');
> +       /* char 5 */
> +       c = *str++;
> +       if (c < '0' || c > '7') return NULL;
> +       mode = (mode << 3) + (c - '0');

Wouldn't this part be cleaner as a constant-length loop? Any
optimizing compiler should end up unrolling this, and we don't get as
much code-duplication...

Oddly enough, this change gave me a drastic (> 20%) performance
increase in my test (isolating get_mode in a separate compilation
unit, and calling it in a loop):

diff --git a/tree-walk.c b/tree-walk.c
index b8d504b..114ad63 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -7,42 +7,30 @@
 static const char *get_mode(const char *str, unsigned int *modep)
 {
 	unsigned char c;
-	unsigned int mode = 0;
+	unsigned int mode = 0, i;

 	/*
-	 * Unroll what looks like a loop since the bounds are
+	 * Allow the compiler to unroll the loop since the bounds are
 	 * well-known. There should be at least 5 and at most 6
 	 * characters available in any valid mode, as '40000' is the
 	 * shortest while '160000' (S_IFGITLINK) is the longest.
 	 */
-	/* char 1 */
-	c = *str++;
-	if (c < '0' || c > '7') return NULL;
-	mode = (mode << 3) + (c - '0');
-	/* char 2 */
-	c = *str++;
-	if (c < '0' || c > '7') return NULL;
-	mode = (mode << 3) + (c - '0');
-	/* char 3 */
-	c = *str++;
-	if (c < '0' || c > '7') return NULL;
-	mode = (mode << 3) + (c - '0');
-	/* char 4 */
-	c = *str++;
-	if (c < '0' || c > '7') return NULL;
-	mode = (mode << 3) + (c - '0');
-	/* char 5 */
-	c = *str++;
-	if (c < '0' || c > '7') return NULL;
-	mode = (mode << 3) + (c - '0');
+	for (i = 0; i < 5; ++i) {
+		c = *str++;
+		if (c < '0' || c > '7')
+			return NULL;
+		mode = (mode << 3) + (c - '0');
+	}
 	/* char 6, optional */
 	if (*str != ' ') {
 		c = *str++;
-		if (c < '0' || c > '7') return NULL;
+		if (c < '0' || c > '7')
+			return NULL;
 		mode = (mode << 3) + (c - '0');
 	}

-	if (*str != ' ') return NULL;
+	if (*str != ' ')
+		return NULL;

 	*modep = mode;
 	return str + 1;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-04-04 10:29   ` Erik Faye-Lund
@ 2011-04-04 12:30     ` Andreas Ericsson
  0 siblings, 0 replies; 30+ messages in thread
From: Andreas Ericsson @ 2011-04-04 12:30 UTC (permalink / raw)
  To: kusmabite; +Cc: Dan McGee, git

On 04/04/2011 12:29 PM, Erik Faye-Lund wrote:
> 
> Wouldn't this part be cleaner as a constant-length loop? Any
> optimizing compiler should end up unrolling this, and we don't get as
> much code-duplication...
> 

It would.

> Oddly enough, this change gave me a drastic (>  20%) performance
> increase in my test (isolating get_mode in a separate compilation
> unit, and calling it in a loop):
> 
> diff --git a/tree-walk.c b/tree-walk.c
> index b8d504b..114ad63 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -7,42 +7,30 @@
>   static const char *get_mode(const char *str, unsigned int *modep)
>   {
>   	unsigned char c;
> -	unsigned int mode = 0;
> +	unsigned int mode = 0, i;
> 
>   	/*
> -	 * Unroll what looks like a loop since the bounds are
> +	 * Allow the compiler to unroll the loop since the bounds are
>   	 * well-known. There should be at least 5 and at most 6
>   	 * characters available in any valid mode, as '40000' is the
>   	 * shortest while '160000' (S_IFGITLINK) is the longest.
>   	 */
> -	/* char 1 */
> -	c = *str++;
> -	if (c<  '0' || c>  '7') return NULL;
> -	mode = (mode<<  3) + (c - '0');
> -	/* char 2 */
> -	c = *str++;
> -	if (c<  '0' || c>  '7') return NULL;
> -	mode = (mode<<  3) + (c - '0');
> -	/* char 3 */
> -	c = *str++;
> -	if (c<  '0' || c>  '7') return NULL;
> -	mode = (mode<<  3) + (c - '0');
> -	/* char 4 */
> -	c = *str++;
> -	if (c<  '0' || c>  '7') return NULL;
> -	mode = (mode<<  3) + (c - '0');
> -	/* char 5 */
> -	c = *str++;
> -	if (c<  '0' || c>  '7') return NULL;
> -	mode = (mode<<  3) + (c - '0');
> +	for (i = 0; i<  5; ++i) {

s/i< 5/i < 5/ (nitpicking, yes)

> +		c = *str++;
> +		if (c<  '0' || c>  '7')
> +			return NULL;
> +		mode = (mode<<  3) + (c - '0');

This could be (micro-)optimized further as:

--%<--%<--
for (i = 0; i < 5; i++) {
	int c = *str++ - '0';
	if (c & ~7) {
		/* 5 chars is ok, so long as *str is now a space */
		if (i == 5 && c == ' ') {
			*modep = mode;
			return str;
		}
		return NULL;
	}
	mode = (mode << 3) + c;
}

/* this should be a space, or we've got a malformed entry */
if (*str != ' ')
	return NULL;

return str + 1;
--%<--%<--

This should be slightly faster since there's now only one comparison
inside the loop and since one branch contains a second branch fork-point
and an inevitable early return the branch prediction machinery in gcc
will favor the most common case properly, but pre-computations on 'mode'
have to be undone in one case of hitting the early return, it might be
faster to play pickup after a loop to 5 is done.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] tree_entry_interesting: inline strncmp()
  2011-03-31  1:37 ` [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
  2011-04-03  4:01   ` Nguyen Thai Ngoc Duy
  2011-04-03 18:55   ` Junio C Hamano
@ 2011-04-04 14:46   ` Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-04-04 14:46 UTC (permalink / raw)
  To: git, Junio C Hamano, dpmcgee; +Cc: Nguyễn Thái Ngọc Duy

strncmp() is the function that takes most of the time inside
tree_entry_interesting(). Inline it so we can shave some seconds out
of function call time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Turns out simplicity is the best. My straight copy of strncmp from
 glibc performed worse.

 With this I get a slightly better performance than Dan's 3/5:
 81.07-82.27 secs versus 82.02-82.92 (no other patches are applied).
 But I'm happy even if it gives the same or slightly worse
 performance because this applies to more cases than flat top tree
 case.

 Dan, match_dir_prefix() can also use some reordering to avoid
 strncmp(). But I suppose it won't give much gain on packages.git

 tree-walk.c |   28 ++++++++++++++++++++++++----
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 322becc..80bfc3a 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -457,6 +457,26 @@ int get_tree_entry(const unsigned char *tree_sha1, const char *name, unsigned ch
 	return retval;
 }
 
+/* Static version of strncmp to reduce function call cost */
+static inline int strncmp_1(const char *s1, const char *s2, size_t n)
+{
+	unsigned char c1 = '\0';
+	unsigned char c2 = '\0';
+
+	if (!n)
+		return 0;
+
+	while (n > 0) {
+		c1 = (unsigned char) *s1++;
+		c2 = (unsigned char) *s2++;
+		if (c1 == '\0' || c1 != c2)
+			return c1 - c2;
+		n--;
+	}
+
+	return c1 - c2;
+}
+
 static int match_entry(const struct name_entry *entry, int pathlen,
 		       const char *match, int matchlen,
 		       int *never_interesting)
@@ -473,7 +493,7 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 		 * Does match sort strictly earlier than path
 		 * with their common parts?
 		 */
-		m = strncmp(match, entry->path,
+		m = strncmp_1(match, entry->path,
 			    (matchlen < pathlen) ? matchlen : pathlen);
 		if (m < 0)
 			return 0;
@@ -509,7 +529,7 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 		 * we cheated and did not do strncmp(), so we do
 		 * that here.
 		 */
-		m = strncmp(match, entry->path, pathlen);
+		m = strncmp_1(match, entry->path, pathlen);
 
 	/*
 	 * If common part matched earlier then it is a hit,
@@ -525,7 +545,7 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 static int match_dir_prefix(const char *base, int baselen,
 			    const char *match, int matchlen)
 {
-	if (strncmp(base, match, matchlen))
+	if (strncmp_1(base, match, matchlen))
 		return 0;
 
 	/*
@@ -592,7 +612,7 @@ int tree_entry_interesting(const struct name_entry *entry,
 		}
 
 		/* Does the base match? */
-		if (!strncmp(base_str, match, baselen)) {
+		if (!strncmp_1(base_str, match, baselen)) {
 			if (match_entry(entry, pathlen,
 					match + baselen, matchlen - baselen,
 					&never_interesting))
-- 
1.7.4.74.g639db

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-03-31  1:38 ` [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Dan McGee
  2011-04-02  9:28   ` Nguyen Thai Ngoc Duy
  2011-04-04 10:29   ` Erik Faye-Lund
@ 2011-04-04 16:55   ` Junio C Hamano
  2011-04-05  5:33     ` Dan McGee
  2 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2011-04-04 16:55 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

Dan McGee <dpmcgee@gmail.com> writes:

> We know our mode entry in our tree objects should be 5 or 6 characters
> long. This change both enforces this fact...

I find the implementation later shown in the thread is cleaner, but I'd
comment on the word "enforces" here.

It is more like "versions of git we know how to read from writes mode bits
with 5 or 6 characters---if we see something else, either the tree object
is corrupt, or we are too old to read the new type of entry in the tree
object".

So returning NULL is fine and it tells the caller that we do not
understand the tree object.  The caller says "corrupt tree file" when we
do so here, but this change needs to rephrase it.  If we stopped because
we saw something other than ' ' after the run of octal digits, then we
know the tree is corrupt.  If we saw a three octal digits 644 in the mode
field, terminated with ' ', maybe we are seeing a new kind of tree entry
generated from later versions of git.

Ideally, I'd rather see error checking done even higher layer than
decode_tree_entry() for the "we are too old" case, though.  We should
return mode 0644 in such a case, and let the caller suggest that the
version of git we are running might be too old, or the tree may have been
written by a broken system that tried to mimic git in its error message.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting
  2011-04-03 18:55   ` Junio C Hamano
@ 2011-04-05  0:22     ` Dan McGee
       [not found]       ` <CAEik5nOKrpFycZYVnSu4_5LYWxn0JS_hVXyiQH-80Bu-C4k8VQ@mail.gmail.com>
  0 siblings, 1 reply; 30+ messages in thread
From: Dan McGee @ 2011-04-05  0:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy

On Sun, Apr 3, 2011 at 1:55 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Dan McGee <dpmcgee@gmail.com> writes:
>
>> In the case of a wide breadth top-level tree (~2400 entries, all trees
>> in this case), we can see a noticeable cost in the profiler calling
>> strncmp() here. Most of the time we are at the base level of the
>> repository, so base is "" and baselen == 0, which means we will always
>> test true. Break out this one tiny case so we can short circuit the
>> strncmp() call.
>
> This sounds as if the patch helps only when you have a superfat tree at
> the "top-level" of the project, but wouldn't this benefit any superfat
> tree at _any_ level while we recursively descend into it?

Correct. I looked at the fact that more often than not, we wouldn't
have to descend into subtrees unless searching for a path underneath
it, so that is why I phrased it that way. So the "in the case of" was
quite literally the case I was testing, but didn't mean to exclude
other potential test cases.

>> This resulted in an ~11% improvement (43 to 38 secs) for a reasonable
>> log operation on the Arch Linux Packages SVN clone repository, which
>> contained 117220 commits and the aforementioned 2400 top-level objects:
>>     git log -- autogen/trunk pacman/trunk/ wget/trunk/
>>
>> Negligible slowdown was noted with other repositories (e.g. linux-2.6).
>
> It would have been easier to swallow if the last sentence were "This could
> lead to a slowdown in repositories without directories that are too wide,
> but in practice it was not even measurable."  "Negligible" sounds as if it
> had still measurable downside, and as if you decided that the slowdown can
> be ignored---but obviously you are not an unbiased judge.

Perhaps I was too cautious with my words- but I was also trying to not
be biased. Considering this same operation takes < 1 second in
linux-2.6, I only wanted to mention it could have a slight effect. In
reality I saw nothing more than an extra 0.01s or so, and definitely
nothing significant. Let me know if you see otherwise.

dmcgee@galway ~/projects/linux-2.6 (master)
$ time ../git/git-log -- zzzzz_not_exist > /dev/null

real	0m0.945s
user	0m0.857s
sys	0m0.083s

> There is nothing wrong in the patch per-se, but I really wish we didn't
> have to do this; it feels like the compiler should be helping us in this
> case.
>
>> Signed-off-by: Dan McGee <dpmcgee@gmail.com>
>> ---
>>  tree-walk.c |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tree-walk.c b/tree-walk.c
>> index 9be8007..f386151 100644
>> --- a/tree-walk.c
>> +++ b/tree-walk.c
>> @@ -591,8 +591,8 @@ int tree_entry_interesting(const struct name_entry *entry,
>>                                             ps->max_depth);
>>               }
>>
>> -             /* Does the base match? */
>> -             if (!strncmp(base_str, match, baselen)) {
>> +             /* Either there must be no base, or the base must match. */
>> +             if (baselen == 0 || !strncmp(base_str, match, baselen)) {
>>                       if (match_entry(entry, pathlen,
>>                                       match + baselen, matchlen - baselen,
>>                                       &never_interesting))
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-04-04 16:55   ` Junio C Hamano
@ 2011-04-05  5:33     ` Dan McGee
  2011-04-05 23:55       ` Antriksh Pany
  0 siblings, 1 reply; 30+ messages in thread
From: Dan McGee @ 2011-04-05  5:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Erik Faye-Lund

On Mon, Apr 4, 2011 at 5:29 AM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>
> We perfer this style:
>
> if (c < '0' || c > '7')
>        return NULL;
>
> i.e a line-break and a tab between the if-statement and the conditional code.
I am well aware, I just wanted to make the code a bit less lengthy due
to the repetition.

On Mon, Apr 4, 2011 at 11:55 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Dan McGee <dpmcgee@gmail.com> writes:
>
>> We know our mode entry in our tree objects should be 5 or 6 characters
>> long. This change both enforces this fact...
>
> I find the implementation later shown in the thread is cleaner,
Totally agree; I should have tried to do it this way in the first
place. However, compiling the fixed-length 0 to 5 loop does not
produce fully-unrolled assembly for me with CFLAGS="-march=native
-mtune=native -O2 -pipe -g" on x86_64. I see two copies of the loop
only, and even worse is the (lack of) performance (each is the mode of
3 runs). Compilers are stupid apparently.

Hand-unrolled:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null
real	0m28.616s
user	0m28.428s
sys	0m0.177s

Static counter loop:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null
real	0m26.393s
user	0m26.185s
sys	0m0.203s

Diff between the two down at the bottom.

> but I'd
> comment on the word "enforces" here.
>
> It is more like "versions of git we know how to read from writes mode bits
> with 5 or 6 characters---if we see something else, either the tree object
> is corrupt, or we are too old to read the new type of entry in the tree
> object".
OK, sounds like we have opposite thinking on this, interesting.

get_mode() is not get_permissions()- it is the same as the stat()
st_mode field, which means it has to (as of now) include at least one
of the S_IFREG, S_IFDIR, or S_IFGITLINK flags in its value.

All of S_ISREG, S_ISDIR, and S_ISGITLINK are used fsck_walk_tree(). So
I'm not seeing a change in the tree storage format happening anytime
soon as there is little way to preserve both forward and backward
compatibility.

Currently git will happily parse both a "1" in the tree stream, or a
"165324132132464677321513252351"- it doesn't care. The problem is in
the former case, a mode of 01 doesn't have any significance later on.
In the latter case, we will be left-shifting our mode to hell and it
becomes worthless.

Do we disagree on clamping the lower limit at 5, the upper limit at 6, or both?

> So returning NULL is fine and it tells the caller that we do not
> understand the tree object.  The caller says "corrupt tree file" when we
> do so here, but this change needs to rephrase it.  If we stopped because
> we saw something other than ' ' after the run of octal digits, then we
> know the tree is corrupt.  If we saw a three octal digits 644 in the mode
> field, terminated with ' ', maybe we are seeing a new kind of tree entry
> generated from later versions of git.
>
> Ideally, I'd rather see error checking done even higher layer than
> decode_tree_entry() for the "we are too old" case, though.  We should
> return mode 0644 in such a case, and let the caller suggest that the
> version of git we are running might be too old, or the tree may have been
> written by a broken system that tried to mimic git in its error message.
>

diff --git a/tree-walk.c b/tree-walk.c
index 63901f8..63ec130 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -7,34 +7,21 @@
 static const char *get_mode(const char *str, unsigned int *modep)
 {
        unsigned char c;
-       unsigned int mode = 0;
+       unsigned int mode = 0, i;

        /*
-        * Unroll what looks like a loop since the bounds are
+        * Allow the compiler to unroll the loop since the bounds are
         * well-known. There should be at least 5 and at most 6
         * characters available in any valid mode, as '40000' is the
         * shortest while '160000' (S_IFGITLINK) is the longest.
         */
-       /* char 1 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
-       /* char 2 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
-       /* char 3 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
-       /* char 4 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
-       /* char 5 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
+       /* chars 1-5, optional */
+       for (i = 0; i < 5; i++) {
+               c = *str++;
+               if (c < '0' || c > '7')
+                       return NULL;
+               mode = (mode << 3) + (c - '0');
+       }
        /* char 6, optional */
        if (*str != ' ') {
                c = *str++;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-04-05  5:33     ` Dan McGee
@ 2011-04-05 23:55       ` Antriksh Pany
  2011-04-06 20:45         ` Dan McGee
  0 siblings, 1 reply; 30+ messages in thread
From: Antriksh Pany @ 2011-04-05 23:55 UTC (permalink / raw)
  To: Dan McGee; +Cc: Junio C Hamano, git, Erik Faye-Lund

On Mon, Apr 4, 2011 at 10:33 PM, Dan McGee <dpmcgee@gmail.com> wrote:
> ....
> Totally agree; I should have tried to do it this way in the first
> place. However, compiling the fixed-length 0 to 5 loop does not
> produce fully-unrolled assembly for me with CFLAGS="-march=native
> -mtune=native -O2 -pipe -g" on x86_64. I see two copies of the loop
> only, and even worse is the (lack of) performance (each is the mode of
> 3 runs). Compilers are stupid apparently.
> ....

Can you try -O3? Or an explicit '-funroll-loops'?
gcc I think does not do aggressive speed optimizations at the cost of
space when at O2.

Cheers
Antriksh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
  2011-04-05 23:55       ` Antriksh Pany
@ 2011-04-06 20:45         ` Dan McGee
  0 siblings, 0 replies; 30+ messages in thread
From: Dan McGee @ 2011-04-06 20:45 UTC (permalink / raw)
  To: Antriksh Pany; +Cc: Junio C Hamano, git, Erik Faye-Lund

On Tue, Apr 5, 2011 at 6:55 PM, Antriksh Pany <antriksh.pany@gmail.com> wrote:
> On Mon, Apr 4, 2011 at 10:33 PM, Dan McGee <dpmcgee@gmail.com> wrote:
>> ....
>> Totally agree; I should have tried to do it this way in the first
>> place. However, compiling the fixed-length 0 to 5 loop does not
>> produce fully-unrolled assembly for me with CFLAGS="-march=native
>> -mtune=native -O2 -pipe -g" on x86_64. I see two copies of the loop
>> only, and even worse is the (lack of) performance (each is the mode of
>> 3 runs). Compilers are stupid apparently.
>> ....
>
> Can you try -O3? Or an explicit '-funroll-loops'?
> gcc I think does not do aggressive speed optimizations at the cost of
> space when at O2.

Sure- both of these options show the loop being unrolled for all 5
iterations. However, that doesn't help me and the other 95% of people
using distro packages, git-scm.com binaries, or anyone compiling with
the default CFLAGS optimization level which is unfortunate.

-Dan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new
  2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
                   ` (5 preceding siblings ...)
  2011-04-01 22:28 ` Junio C Hamano
@ 2011-05-03  7:34 ` Nguyen Thai Ngoc Duy
  6 siblings, 0 replies; 30+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-05-03  7:34 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

On Thu, Mar 31, 2011 at 8:37 AM, Dan McGee <dpmcgee@gmail.com> wrote:
> These next few patches are all derived from trying to make operations on this
> repository a bit faster:
>    http://projects.archlinux.org/svntogit/packages.git/
>
> As far as a different shape of repository, this one qualifies. Some stats from
> after running `git gc --prune` and looking at things from count-objects among
> others:

Does this series need any work to get (at least part of) it in?
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix
  2011-03-31  1:37 ` [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
@ 2011-08-30 18:55   ` Dan McGee
  0 siblings, 0 replies; 30+ messages in thread
From: Dan McGee @ 2011-08-30 18:55 UTC (permalink / raw)
  To: git; +Cc: Dan McGee

On Wed, Mar 30, 2011 at 8:37 PM, Dan McGee <dpmcgee@gmail.com> wrote:
> Signed-off-by: Dan McGee <dpmcgee@gmail.com>
> ---

This still seems to apply to the current code, ping?

>  tree-walk.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tree-walk.c b/tree-walk.c
> index 322becc..9be8007 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -522,7 +522,7 @@ static int match_entry(const struct name_entry *entry, int pathlen,
>        return 0;
>  }
>
> -static int match_dir_prefix(const char *base, int baselen,
> +static int match_dir_prefix(const char *base,
>                            const char *match, int matchlen)
>  {
>        if (strncmp(base, match, matchlen))
> @@ -579,7 +579,7 @@ int tree_entry_interesting(const struct name_entry *entry,
>
>                if (baselen >= matchlen) {
>                        /* If it doesn't match, move along... */
> -                       if (!match_dir_prefix(base_str, baselen, match, matchlen))
> +                       if (!match_dir_prefix(base_str, match, matchlen))
>                                goto match_wildcards;
>
>                        if (!ps->recursive || ps->max_depth == -1)
> --
> 1.7.4.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting
       [not found]       ` <CAEik5nOKrpFycZYVnSu4_5LYWxn0JS_hVXyiQH-80Bu-C4k8VQ@mail.gmail.com>
@ 2011-08-30 19:51         ` Dan McGee
  2011-08-30 20:40           ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Dan McGee @ 2011-08-30 19:51 UTC (permalink / raw)
  To: git

On Mon, Apr 4, 2011 at 7:22 PM, Dan McGee <dpmcgee@gmail.com> wrote:
> On Sun, Apr 3, 2011 at 1:55 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Dan McGee <dpmcgee@gmail.com> writes:
>>
>>> In the case of a wide breadth top-level tree (~2400 entries, all trees
>>> in this case), we can see a noticeable cost in the profiler calling
>>> strncmp() here. Most of the time we are at the base level of the
>>> repository, so base is "" and baselen == 0, which means we will always
>>> test true. Break out this one tiny case so we can short circuit the
>>> strncmp() call.
>>
>> This sounds as if the patch helps only when you have a superfat tree at
>> the "top-level" of the project, but wouldn't this benefit any superfat
>> tree at _any_ level while we recursively descend into it?
>
> Correct. I looked at the fact that more often than not, we wouldn't
> have to descend into subtrees unless searching for a path underneath
> it, so that is why I phrased it that way. So the "in the case of" was
> quite literally the case I was testing, but didn't mean to exclude
> other potential test cases.
>
>>> This resulted in an ~11% improvement (43 to 38 secs) for a reasonable
>>> log operation on the Arch Linux Packages SVN clone repository, which
>>> contained 117220 commits and the aforementioned 2400 top-level objects:
>>>     git log -- autogen/trunk pacman/trunk/ wget/trunk/
>>>
>>> Negligible slowdown was noted with other repositories (e.g. linux-2.6).
>>
>> It would have been easier to swallow if the last sentence were "This could
>> lead to a slowdown in repositories without directories that are too wide,
>> but in practice it was not even measurable."  "Negligible" sounds as if it
>> had still measurable downside, and as if you decided that the slowdown can
>> be ignored---but obviously you are not an unbiased judge.
>
> Perhaps I was too cautious with my words- but I was also trying to not
> be biased. Considering this same operation takes < 1 second in
> linux-2.6, I only wanted to mention it could have a slight effect. In
> reality I saw nothing more than an extra 0.01s or so, and definitely
> nothing significant. Let me know if you see otherwise.
>
> dmcgee@galway ~/projects/linux-2.6 (master)
> $ time ../git/git-log -- zzzzz_not_exist > /dev/null
>
> real    0m0.945s
> user    0m0.857s
> sys     0m0.083s
>
>> There is nothing wrong in the patch per-se, but I really wish we didn't
>> have to do this; it feels like the compiler should be helping us in this
>> case.

If I resurrect this with an updated commit message reflecting concerns
raised, can it be merged? Given that it is a noticeable performance
boost on real-life repositories and I can show it has little (<1%) to
no impact on most repos, it is a definite win.

>>> Signed-off-by: Dan McGee <dpmcgee@gmail.com>
>>> ---
>>>  tree-walk.c |    4 ++--
>>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tree-walk.c b/tree-walk.c
>>> index 9be8007..f386151 100644
>>> --- a/tree-walk.c
>>> +++ b/tree-walk.c
>>> @@ -591,8 +591,8 @@ int tree_entry_interesting(const struct name_entry *entry,
>>>                                             ps->max_depth);
>>>               }
>>>
>>> -             /* Does the base match? */
>>> -             if (!strncmp(base_str, match, baselen)) {
>>> +             /* Either there must be no base, or the base must match. */
>>> +             if (baselen == 0 || !strncmp(base_str, match, baselen)) {
>>>                       if (match_entry(entry, pathlen,
>>>                                       match + baselen, matchlen - baselen,
>>>                                       &never_interesting))
>>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting
  2011-08-30 19:51         ` Dan McGee
@ 2011-08-30 20:40           ` Junio C Hamano
  2011-09-09  2:02             ` [PATCH 1/2] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2011-08-30 20:40 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

Dan McGee <dpmcgee@gmail.com> writes:

>> dmcgee@galway ~/projects/linux-2.6 (master)
>> $ time ../git/git-log -- zzzzz_not_exist > /dev/null
>>
>> real    0m0.945s
>> user    0m0.857s
>> sys     0m0.083s
>>
>>> There is nothing wrong in the patch per-se, but I really wish we didn't
>>> have to do this; it feels like the compiler should be helping us in this
>>> case.
>
> If I resurrect this with an updated commit message reflecting concerns
> raised, can it be merged? Given that it is a noticeable performance
> boost on real-life repositories and I can show it has little (<1%) to
> no impact on most repos, it is a definite win.

I do not see anything wrong in this particular patch per-se, but I really
wish we didn't have to do this.

Please include a few lines of benchmarking result in the updated commit
log message as well if you are rerolling this patch, perhaps like I did
in:

  http://thread.gmane.org/gmane.comp.version-control.git/179926/focus=180361

i.e., before and after comparison.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] tree-walk: drop unused parameter from match_dir_prefix
  2011-08-30 20:40           ` Junio C Hamano
@ 2011-09-09  2:02             ` Dan McGee
  2011-09-09  2:02               ` [PATCH 2/2] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
  0 siblings, 1 reply; 30+ messages in thread
From: Dan McGee @ 2011-09-09  2:02 UTC (permalink / raw)
  To: git

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 tree-walk.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 33f749e..dbcd94a 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -522,7 +522,7 @@ static int match_entry(const struct name_entry *entry, int pathlen,
 	return 0;
 }
 
-static int match_dir_prefix(const char *base, int baselen,
+static int match_dir_prefix(const char *base,
 			    const char *match, int matchlen)
 {
 	if (strncmp(base, match, matchlen))
@@ -579,7 +579,7 @@ int tree_entry_interesting(const struct name_entry *entry,
 
 		if (baselen >= matchlen) {
 			/* If it doesn't match, move along... */
-			if (!match_dir_prefix(base_str, baselen, match, matchlen))
+			if (!match_dir_prefix(base_str, match, matchlen))
 				goto match_wildcards;
 
 			if (!ps->recursive || ps->max_depth == -1)
-- 
1.7.6.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] tree-walk: micro-optimization in tree_entry_interesting
  2011-09-09  2:02             ` [PATCH 1/2] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
@ 2011-09-09  2:02               ` Dan McGee
  0 siblings, 0 replies; 30+ messages in thread
From: Dan McGee @ 2011-09-09  2:02 UTC (permalink / raw)
  To: git

In the case of a wide breadth top-level tree (~2400 entries, all trees
in this case), we can see a noticeable cost in the profiler calling
strncmp() here. Most of the time we are at the base level of the
repository, so base is "" and baselen == 0, which means we will always
test true. Break out this one tiny case so we can short circuit the
strncmp() call.

Test cases are as follows. packages.git is the Arch Linux git-svn clone
of the packages repository which has the characteristics above.

Commands:
[1] packages.git, /usr/bin/time git log >/dev/null
[2] packages.git, /usr/bin/time git log -- autogen/trunk pacman/trunk wget/trunk >/dev/null
[3] linux.git, /usr/bin/time git log >/dev/null
[4] linux.git, /usr/bin/time git log -- drivers/ata drivers/uio tools >/dev/null

Results:
     before  after  %faster
[1]   2.56    2.55   0.4%
[2]  51.82   48.66   6.5%
[3]   5.58    5.61  -0.5%
[4]   1.55    1.51   0.2%

The takeaway here is this doesn't matter in many operations, but it does
for a certain style of repository and operation where it nets a 6.5%
measured improvement. The other changes are likely not significant by
reasonable statistics methods.

Note: the measured improvement when originally submitted was ~11% (43 to
38 secs) for operation [2]. At the time, the repository had 117220
commits; it now has 137537 commits.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 tree-walk.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index dbcd94a..e401f07 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -591,8 +591,8 @@ int tree_entry_interesting(const struct name_entry *entry,
 					      ps->max_depth);
 		}
 
-		/* Does the base match? */
-		if (!strncmp(base_str, match, baselen)) {
+		/* Either there must be no base, or the base must match. */
+		if (baselen == 0 || !strncmp(base_str, match, baselen)) {
 			if (match_entry(entry, pathlen,
 					match + baselen, matchlen - baselen,
 					&never_interesting))
-- 
1.7.6.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-09-09  2:03 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
2011-03-31  1:37 ` [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
2011-08-30 18:55   ` Dan McGee
2011-03-31  1:37 ` [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
2011-04-03  4:01   ` Nguyen Thai Ngoc Duy
2011-04-03 18:55   ` Junio C Hamano
2011-04-05  0:22     ` Dan McGee
     [not found]       ` <CAEik5nOKrpFycZYVnSu4_5LYWxn0JS_hVXyiQH-80Bu-C4k8VQ@mail.gmail.com>
2011-08-30 19:51         ` Dan McGee
2011-08-30 20:40           ` Junio C Hamano
2011-09-09  2:02             ` [PATCH 1/2] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
2011-09-09  2:02               ` [PATCH 2/2] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
2011-04-04 14:46   ` [PATCH] tree_entry_interesting: inline strncmp() Nguyễn Thái Ngọc Duy
2011-03-31  1:38 ` [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Dan McGee
2011-04-02  9:28   ` Nguyen Thai Ngoc Duy
2011-04-02 17:28     ` Dan McGee
2011-04-03  4:07       ` Nguyen Thai Ngoc Duy
2011-04-04 10:29   ` Erik Faye-Lund
2011-04-04 12:30     ` Andreas Ericsson
2011-04-04 16:55   ` Junio C Hamano
2011-04-05  5:33     ` Dan McGee
2011-04-05 23:55       ` Antriksh Pany
2011-04-06 20:45         ` Dan McGee
2011-03-31  1:38 ` [PATCH 5/5] tree-walk: match_entry microoptimization Dan McGee
2011-04-02  9:06   ` Nguyen Thai Ngoc Duy
2011-04-02 17:54     ` Dan McGee
2011-03-31 12:58 ` [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Nguyen Thai Ngoc Duy
2011-03-31 13:56   ` Dan McGee
2011-04-01 22:28 ` Junio C Hamano
     [not found]   ` <AANLkTinPSqDPdGi5nA3sH1D2wMSW1SQc+5gRqdLy++y0@mail.gmail.com>
2011-04-02 18:38     ` Fwd: " Dan McGee
2011-05-03  7:34 ` Nguyen Thai Ngoc Duy

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).