From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan McGee Subject: Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known Date: Sat, 2 Apr 2011 12:28:24 -0500 Message-ID: References: <1301535481-1085-1-git-send-email-dpmcgee@gmail.com> <1301535481-1085-4-git-send-email-dpmcgee@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: git@vger.kernel.org To: Nguyen Thai Ngoc Duy X-From: git-owner@vger.kernel.org Sat Apr 02 19:28:36 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Q64ci-0006L7-S6 for gcvg-git-2@lo.gmane.org; Sat, 02 Apr 2011 19:28:33 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756162Ab1DBR20 (ORCPT ); Sat, 2 Apr 2011 13:28:26 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:46347 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756097Ab1DBR2Z (ORCPT ); Sat, 2 Apr 2011 13:28:25 -0400 Received: by wwa36 with SMTP id 36so5142899wwa.1 for ; Sat, 02 Apr 2011 10:28:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=VbJEDnprZc44nekWLX+yLhQuXUuu8k5VgEXySUlKZOk=; b=rVqG++R75xnZ/8QUfgEt9EclRjHOdPBSAWO5LVPcxPHxkGJl8ltetz7jZ66HDg0BlT rAXAXFpSJZbj0l9ZfxiwXkEETEkeStDiEI9cqy+lVxrSFmetuEW1V7G2ogpn8lAZHyBB HZDHREmAXhlZGDBg9mF6ubc78p6D8nlo72Qms= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=r2DBhTY9db+0C7RhHxThpgN9kev9cmrSNMfHYj2Zk8H80B2iyQQmWNJypvzsqFXrW5 39jhkY2S5jDZq1zXU4puMfqOKpdGOXUb8mJlNRnaldN/Zy1ZlaizlioZcDi8c8JFttyQ +t6PvzAEC3/JfSgyOWokO1nPdkD6jIOgHnGM8= Received: by 10.227.173.4 with SMTP id n4mr5459528wbz.132.1301765304391; Sat, 02 Apr 2011 10:28:24 -0700 (PDT) Received: by 10.227.147.16 with HTTP; Sat, 2 Apr 2011 10:28:24 -0700 (PDT) In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Sat, Apr 2, 2011 at 4:28 AM, Nguyen Thai Ngoc Duy wrote: > On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee wrote: >> We know our mode entry in our tree objects should be 5 or 6 characters >> long. This change both enforces this fact and also unrolls the parsing >> of the information giving the compiler more room for optimization of the >> operations. > > I'm skeptical. Did you measure signficant gain after this patch? I > looked at asm output with -O3 and failed to see the compiler doing > anything fancy. Perhaps it's because I'm on x86 with quite small > register set. I'm on x86_64 and was just using -O2; -O3 produces the same output actually. You can see it below. I had taken a look at this before I submitted, and noticed a few things: 1. We do use multiple registers now since we aren't constrained to a loop. 2. movzbl (for the string parts) and cmb instructions tend to get clustered first. 3. mozbl (for the mode shifting) and leal instructions tend to get clustered later. 4. The normal case now involves no conditional jumps until the ' ' (space) comparison. Call these "trivial", but on my worst case operation times went from (shown below) 27.41 secs to 26.49 secs. Considering this operation is called 530,588,868 times (that is not a typo) during this operation, every saved instruction or non-missed branch prediction does seem to make a difference. -Dan Repo: http://projects.archlinux.org/svntogit/packages.git/ Old: $ time ../git/git-log -- zzzzz_not_exist > /dev/null real 0m27.409s user 0m27.172s sys 0m0.230s .LVL3: .LBB58: .LBB59: .loc 1 12 0 is_stmt 1 movzbl (%rsi), %eax cmpb $32, %al je .L5 .LVL4: .loc 1 16 0 leal -48(%rax), %edx .LVL5: .loc 1 15 0 leaq 1(%rsi), %rdi .LVL6: .loc 1 16 0 cmpb $7, %dl ja .L5 xorl %edx, %edx jmp .L6 .LVL7: .p2align 4,,10 .p2align 3 .L7: leal -48(%rax), %ecx cmpb $7, %cl ja .L5 .LVL8: .L6: .loc 1 18 0 movzbl %al, %eax leal -48(%rax,%rdx,8), %edx .LVL9: .loc 1 15 0 movzbl (%rdi), %eax .LVL10: addq $1, %rdi .LVL11: cmpb $32, %al jne .L7 New: $ time ../git/git-log -- zzzzz_not_exist > /dev/null real 0m26.490s user 0m26.282s sys 0m0.200s .LVL3: .LBB58: .LBB59: .loc 1 19 0 is_stmt 1 movzbl (%rsi), %eax .LVL4: .loc 1 20 0 leal -48(%rax), %edx .LVL5: cmpb $7, %dl ja .L5 .LVL6: .loc 1 23 0 movzbl 1(%rsi), %edx .LVL7: .loc 1 24 0 leal -48(%rdx), %ecx cmpb $7, %cl ja .L5 .LVL8: .loc 1 27 0 movzbl 2(%rsi), %ecx .LVL9: .loc 1 28 0 leal -48(%rcx), %edi cmpb $7, %dil ja .L5 .LVL10: .loc 1 31 0 movzbl 3(%rsi), %edi .LVL11: .loc 1 32 0 leal -48(%rdi), %r8d cmpb $7, %r8b ja .L5 .LVL12: .loc 1 35 0 movzbl 4(%rsi), %r8d .LVL13: .loc 1 36 0 leal -48(%r8), %r9d cmpb $7, %r9b ja .L5 .loc 1 21 0 movzbl %al, %eax .loc 1 25 0 movzbl %dl, %edx .loc 1 29 0 movzbl %cl, %ecx .loc 1 25 0 leal -432(%rdx,%rax,8), %edx .loc 1 33 0 movzbl %dil, %edi .loc 1 37 0 movzbl %r8b, %r8d .loc 1 35 0 leaq 5(%rsi), %rax .loc 1 29 0 leal -48(%rcx,%rdx,8), %edx .loc 1 39 0 movzbl 5(%rsi), %ecx .loc 1 33 0 leal -48(%rdi,%rdx,8), %edx .loc 1 39 0 cmpb $32, %cl .loc 1 37 0 leal -48(%r8,%rdx,8), %edx .LVL14: .loc 1 39 0 je .L7 .LVL15: .loc 1 41 0 leal -48(%rcx), %eax cmpb $7, %al ja .L5 .loc 1 45 0 cmpb $32, 6(%rsi) .loc 1 42 0 movzbl %cl, %ecx .loc 1 40 0 leaq 6(%rsi), %rax .loc 1 42 0 leal -48(%rcx,%rdx,8), %edx .LVL16: .loc 1 45 0 jne .L5 diff --git a/tree-walk.c b/tree-walk.c index 63901f8..dd7bd45 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -4,11 +4,22 @@ #include "dir.h" #include "tree.h" +static unsigned long hit_ctr = 0; + +static void print_hit_ctr(void) +{ + fprintf(stderr, "hit_ctr: %lu\n", hit_ctr); +} + static const char *get_mode(const char *str, unsigned int *modep) { unsigned char c; unsigned int mode = 0; + if(hit_ctr == 0) { + atexit(print_hit_ctr); + } + hit_ctr++; /* * Unroll what looks like a loop since the bounds are * well-known. There should be at least 5 and at most 6