git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Dan McGee <dpmcgee@gmail.com>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
Date: Sat, 2 Apr 2011 12:28:24 -0500	[thread overview]
Message-ID: <BANLkTi=MupnQ9Ovy=A0nD+wDaK7wkVDryw@mail.gmail.com> (raw)
In-Reply-To: <BANLkTi=QK0_P3=rGFLXzZzk7c7JSNxuBmA@mail.gmail.com>

On Sat, Apr 2, 2011 at 4:28 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee <dpmcgee@gmail.com> wrote:
>> We know our mode entry in our tree objects should be 5 or 6 characters
>> long. This change both enforces this fact and also unrolls the parsing
>> of the information giving the compiler more room for optimization of the
>> operations.
>
> I'm skeptical. Did you measure signficant gain after this patch? I
> looked at asm output with -O3 and failed to see the compiler doing
> anything fancy. Perhaps it's because I'm on x86 with quite small
> register set.

I'm on x86_64 and was just using -O2; -O3 produces the same output
actually. You can see it below. I had taken a look at this before I
submitted, and noticed a few things:
1. We do use multiple registers now since we aren't constrained to a loop.
2. movzbl (for the string parts) and cmb instructions tend to get
clustered first.
3. mozbl (for the mode shifting) and leal instructions tend to get
clustered later.
4. The normal case now involves no conditional jumps until the ' '
(space) comparison.

Call these "trivial", but on my worst case operation times went from
(shown below) 27.41 secs to 26.49 secs. Considering this operation is
called 530,588,868 times (that is not a typo) during this operation,
every saved instruction or non-missed branch prediction does seem to
make a difference.

-Dan

Repo:
http://projects.archlinux.org/svntogit/packages.git/

Old:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null

real	0m27.409s
user	0m27.172s
sys	0m0.230s

.LVL3:
.LBB58:
.LBB59:
	.loc 1 12 0 is_stmt 1
	movzbl	(%rsi), %eax
	cmpb	$32, %al
	je	.L5
.LVL4:
	.loc 1 16 0
	leal	-48(%rax), %edx
.LVL5:
	.loc 1 15 0
	leaq	1(%rsi), %rdi
.LVL6:
	.loc 1 16 0
	cmpb	$7, %dl
	ja	.L5
	xorl	%edx, %edx
	jmp	.L6
.LVL7:
	.p2align 4,,10
	.p2align 3
.L7:
	leal	-48(%rax), %ecx
	cmpb	$7, %cl
	ja	.L5
.LVL8:
.L6:
	.loc 1 18 0
	movzbl	%al, %eax
	leal	-48(%rax,%rdx,8), %edx
.LVL9:
	.loc 1 15 0
	movzbl	(%rdi), %eax
.LVL10:
	addq	$1, %rdi
.LVL11:
	cmpb	$32, %al
	jne	.L7


New:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null

real	0m26.490s
user	0m26.282s
sys	0m0.200s

.LVL3:
.LBB58:
.LBB59:
	.loc 1 19 0 is_stmt 1
	movzbl	(%rsi), %eax
.LVL4:
	.loc 1 20 0
	leal	-48(%rax), %edx
.LVL5:
	cmpb	$7, %dl
	ja	.L5
.LVL6:
	.loc 1 23 0
	movzbl	1(%rsi), %edx
.LVL7:
	.loc 1 24 0
	leal	-48(%rdx), %ecx
	cmpb	$7, %cl
	ja	.L5
.LVL8:
	.loc 1 27 0
	movzbl	2(%rsi), %ecx
.LVL9:
	.loc 1 28 0
	leal	-48(%rcx), %edi
	cmpb	$7, %dil
	ja	.L5
.LVL10:
	.loc 1 31 0
	movzbl	3(%rsi), %edi
.LVL11:
	.loc 1 32 0
	leal	-48(%rdi), %r8d
	cmpb	$7, %r8b
	ja	.L5
.LVL12:
	.loc 1 35 0
	movzbl	4(%rsi), %r8d
.LVL13:
	.loc 1 36 0
	leal	-48(%r8), %r9d
	cmpb	$7, %r9b
	ja	.L5
	.loc 1 21 0
	movzbl	%al, %eax
	.loc 1 25 0
	movzbl	%dl, %edx
	.loc 1 29 0
	movzbl	%cl, %ecx
	.loc 1 25 0
	leal	-432(%rdx,%rax,8), %edx
	.loc 1 33 0
	movzbl	%dil, %edi
	.loc 1 37 0
	movzbl	%r8b, %r8d
	.loc 1 35 0
	leaq	5(%rsi), %rax
	.loc 1 29 0
	leal	-48(%rcx,%rdx,8), %edx
	.loc 1 39 0
	movzbl	5(%rsi), %ecx
	.loc 1 33 0
	leal	-48(%rdi,%rdx,8), %edx
	.loc 1 39 0
	cmpb	$32, %cl
	.loc 1 37 0
	leal	-48(%r8,%rdx,8), %edx
.LVL14:
	.loc 1 39 0
	je	.L7
.LVL15:
	.loc 1 41 0
	leal	-48(%rcx), %eax
	cmpb	$7, %al
	ja	.L5
	.loc 1 45 0
	cmpb	$32, 6(%rsi)
	.loc 1 42 0
	movzbl	%cl, %ecx
	.loc 1 40 0
	leaq	6(%rsi), %rax
	.loc 1 42 0
	leal	-48(%rcx,%rdx,8), %edx
.LVL16:
	.loc 1 45 0
	jne	.L5

diff --git a/tree-walk.c b/tree-walk.c
index 63901f8..dd7bd45 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -4,11 +4,22 @@
 #include "dir.h"
 #include "tree.h"

+static unsigned long hit_ctr = 0;
+
+static void print_hit_ctr(void)
+{
+       fprintf(stderr, "hit_ctr: %lu\n", hit_ctr);
+}
+
 static const char *get_mode(const char *str, unsigned int *modep)
 {
        unsigned char c;
        unsigned int mode = 0;

+       if(hit_ctr == 0) {
+               atexit(print_hit_ctr);
+       }
+       hit_ctr++;
        /*
         * Unroll what looks like a loop since the bounds are
         * well-known. There should be at least 5 and at most 6

  reply	other threads:[~2011-04-02 17:28 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-31  1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
2011-03-31  1:37 ` [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
2011-08-30 18:55   ` Dan McGee
2011-03-31  1:37 ` [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
2011-04-03  4:01   ` Nguyen Thai Ngoc Duy
2011-04-03 18:55   ` Junio C Hamano
2011-04-05  0:22     ` Dan McGee
     [not found]       ` <CAEik5nOKrpFycZYVnSu4_5LYWxn0JS_hVXyiQH-80Bu-C4k8VQ@mail.gmail.com>
2011-08-30 19:51         ` Dan McGee
2011-08-30 20:40           ` Junio C Hamano
2011-09-09  2:02             ` [PATCH 1/2] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
2011-09-09  2:02               ` [PATCH 2/2] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
2011-04-04 14:46   ` [PATCH] tree_entry_interesting: inline strncmp() Nguyễn Thái Ngọc Duy
2011-03-31  1:38 ` [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Dan McGee
2011-04-02  9:28   ` Nguyen Thai Ngoc Duy
2011-04-02 17:28     ` Dan McGee [this message]
2011-04-03  4:07       ` Nguyen Thai Ngoc Duy
2011-04-04 10:29   ` Erik Faye-Lund
2011-04-04 12:30     ` Andreas Ericsson
2011-04-04 16:55   ` Junio C Hamano
2011-04-05  5:33     ` Dan McGee
2011-04-05 23:55       ` Antriksh Pany
2011-04-06 20:45         ` Dan McGee
2011-03-31  1:38 ` [PATCH 5/5] tree-walk: match_entry microoptimization Dan McGee
2011-04-02  9:06   ` Nguyen Thai Ngoc Duy
2011-04-02 17:54     ` Dan McGee
2011-03-31 12:58 ` [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Nguyen Thai Ngoc Duy
2011-03-31 13:56   ` Dan McGee
2011-04-01 22:28 ` Junio C Hamano
     [not found]   ` <AANLkTinPSqDPdGi5nA3sH1D2wMSW1SQc+5gRqdLy++y0@mail.gmail.com>
2011-04-02 18:38     ` Fwd: " Dan McGee
2011-05-03  7:34 ` Nguyen Thai Ngoc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTi=MupnQ9Ovy=A0nD+wDaK7wkVDryw@mail.gmail.com' \
    --to=dpmcgee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).