From: Dan McGee <dpmcgee@gmail.com>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known
Date: Sat, 2 Apr 2011 12:28:24 -0500 [thread overview]
Message-ID: <BANLkTi=MupnQ9Ovy=A0nD+wDaK7wkVDryw@mail.gmail.com> (raw)
In-Reply-To: <BANLkTi=QK0_P3=rGFLXzZzk7c7JSNxuBmA@mail.gmail.com>
On Sat, Apr 2, 2011 at 4:28 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee <dpmcgee@gmail.com> wrote:
>> We know our mode entry in our tree objects should be 5 or 6 characters
>> long. This change both enforces this fact and also unrolls the parsing
>> of the information giving the compiler more room for optimization of the
>> operations.
>
> I'm skeptical. Did you measure signficant gain after this patch? I
> looked at asm output with -O3 and failed to see the compiler doing
> anything fancy. Perhaps it's because I'm on x86 with quite small
> register set.
I'm on x86_64 and was just using -O2; -O3 produces the same output
actually. You can see it below. I had taken a look at this before I
submitted, and noticed a few things:
1. We do use multiple registers now since we aren't constrained to a loop.
2. movzbl (for the string parts) and cmb instructions tend to get
clustered first.
3. mozbl (for the mode shifting) and leal instructions tend to get
clustered later.
4. The normal case now involves no conditional jumps until the ' '
(space) comparison.
Call these "trivial", but on my worst case operation times went from
(shown below) 27.41 secs to 26.49 secs. Considering this operation is
called 530,588,868 times (that is not a typo) during this operation,
every saved instruction or non-missed branch prediction does seem to
make a difference.
-Dan
Repo:
http://projects.archlinux.org/svntogit/packages.git/
Old:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null
real 0m27.409s
user 0m27.172s
sys 0m0.230s
.LVL3:
.LBB58:
.LBB59:
.loc 1 12 0 is_stmt 1
movzbl (%rsi), %eax
cmpb $32, %al
je .L5
.LVL4:
.loc 1 16 0
leal -48(%rax), %edx
.LVL5:
.loc 1 15 0
leaq 1(%rsi), %rdi
.LVL6:
.loc 1 16 0
cmpb $7, %dl
ja .L5
xorl %edx, %edx
jmp .L6
.LVL7:
.p2align 4,,10
.p2align 3
.L7:
leal -48(%rax), %ecx
cmpb $7, %cl
ja .L5
.LVL8:
.L6:
.loc 1 18 0
movzbl %al, %eax
leal -48(%rax,%rdx,8), %edx
.LVL9:
.loc 1 15 0
movzbl (%rdi), %eax
.LVL10:
addq $1, %rdi
.LVL11:
cmpb $32, %al
jne .L7
New:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null
real 0m26.490s
user 0m26.282s
sys 0m0.200s
.LVL3:
.LBB58:
.LBB59:
.loc 1 19 0 is_stmt 1
movzbl (%rsi), %eax
.LVL4:
.loc 1 20 0
leal -48(%rax), %edx
.LVL5:
cmpb $7, %dl
ja .L5
.LVL6:
.loc 1 23 0
movzbl 1(%rsi), %edx
.LVL7:
.loc 1 24 0
leal -48(%rdx), %ecx
cmpb $7, %cl
ja .L5
.LVL8:
.loc 1 27 0
movzbl 2(%rsi), %ecx
.LVL9:
.loc 1 28 0
leal -48(%rcx), %edi
cmpb $7, %dil
ja .L5
.LVL10:
.loc 1 31 0
movzbl 3(%rsi), %edi
.LVL11:
.loc 1 32 0
leal -48(%rdi), %r8d
cmpb $7, %r8b
ja .L5
.LVL12:
.loc 1 35 0
movzbl 4(%rsi), %r8d
.LVL13:
.loc 1 36 0
leal -48(%r8), %r9d
cmpb $7, %r9b
ja .L5
.loc 1 21 0
movzbl %al, %eax
.loc 1 25 0
movzbl %dl, %edx
.loc 1 29 0
movzbl %cl, %ecx
.loc 1 25 0
leal -432(%rdx,%rax,8), %edx
.loc 1 33 0
movzbl %dil, %edi
.loc 1 37 0
movzbl %r8b, %r8d
.loc 1 35 0
leaq 5(%rsi), %rax
.loc 1 29 0
leal -48(%rcx,%rdx,8), %edx
.loc 1 39 0
movzbl 5(%rsi), %ecx
.loc 1 33 0
leal -48(%rdi,%rdx,8), %edx
.loc 1 39 0
cmpb $32, %cl
.loc 1 37 0
leal -48(%r8,%rdx,8), %edx
.LVL14:
.loc 1 39 0
je .L7
.LVL15:
.loc 1 41 0
leal -48(%rcx), %eax
cmpb $7, %al
ja .L5
.loc 1 45 0
cmpb $32, 6(%rsi)
.loc 1 42 0
movzbl %cl, %ecx
.loc 1 40 0
leaq 6(%rsi), %rax
.loc 1 42 0
leal -48(%rcx,%rdx,8), %edx
.LVL16:
.loc 1 45 0
jne .L5
diff --git a/tree-walk.c b/tree-walk.c
index 63901f8..dd7bd45 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -4,11 +4,22 @@
#include "dir.h"
#include "tree.h"
+static unsigned long hit_ctr = 0;
+
+static void print_hit_ctr(void)
+{
+ fprintf(stderr, "hit_ctr: %lu\n", hit_ctr);
+}
+
static const char *get_mode(const char *str, unsigned int *modep)
{
unsigned char c;
unsigned int mode = 0;
+ if(hit_ctr == 0) {
+ atexit(print_hit_ctr);
+ }
+ hit_ctr++;
/*
* Unroll what looks like a loop since the bounds are
* well-known. There should be at least 5 and at most 6
next prev parent reply other threads:[~2011-04-02 17:28 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-31 1:37 [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Dan McGee
2011-03-31 1:37 ` [PATCH 2/5] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
2011-08-30 18:55 ` Dan McGee
2011-03-31 1:37 ` [PATCH 3/5] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
2011-04-03 4:01 ` Nguyen Thai Ngoc Duy
2011-04-03 18:55 ` Junio C Hamano
2011-04-05 0:22 ` Dan McGee
[not found] ` <CAEik5nOKrpFycZYVnSu4_5LYWxn0JS_hVXyiQH-80Bu-C4k8VQ@mail.gmail.com>
2011-08-30 19:51 ` Dan McGee
2011-08-30 20:40 ` Junio C Hamano
2011-09-09 2:02 ` [PATCH 1/2] tree-walk: drop unused parameter from match_dir_prefix Dan McGee
2011-09-09 2:02 ` [PATCH 2/2] tree-walk: micro-optimization in tree_entry_interesting Dan McGee
2011-04-04 14:46 ` [PATCH] tree_entry_interesting: inline strncmp() Nguyễn Thái Ngọc Duy
2011-03-31 1:38 ` [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Dan McGee
2011-04-02 9:28 ` Nguyen Thai Ngoc Duy
2011-04-02 17:28 ` Dan McGee [this message]
2011-04-03 4:07 ` Nguyen Thai Ngoc Duy
2011-04-04 10:29 ` Erik Faye-Lund
2011-04-04 12:30 ` Andreas Ericsson
2011-04-04 16:55 ` Junio C Hamano
2011-04-05 5:33 ` Dan McGee
2011-04-05 23:55 ` Antriksh Pany
2011-04-06 20:45 ` Dan McGee
2011-03-31 1:38 ` [PATCH 5/5] tree-walk: match_entry microoptimization Dan McGee
2011-04-02 9:06 ` Nguyen Thai Ngoc Duy
2011-04-02 17:54 ` Dan McGee
2011-03-31 12:58 ` [PATCH 1/5] diff_tree_sha1: skip diff_tree if old == new Nguyen Thai Ngoc Duy
2011-03-31 13:56 ` Dan McGee
2011-04-01 22:28 ` Junio C Hamano
[not found] ` <AANLkTinPSqDPdGi5nA3sH1D2wMSW1SQc+5gRqdLy++y0@mail.gmail.com>
2011-04-02 18:38 ` Fwd: " Dan McGee
2011-05-03 7:34 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='BANLkTi=MupnQ9Ovy=A0nD+wDaK7wkVDryw@mail.gmail.com' \
--to=dpmcgee@gmail.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).