git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Steven Grimm <koreth@midwinter.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Jon Smirl <jonsmirl@gmail.com>,
	Julian Phillips <julian@quantumfyre.co.uk>,
	Andreas Ericsson <ae@op5.se>, Theodore Tso <tytso@mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Git's database structure
Date: Thu, 06 Sep 2007 11:14:06 -0700	[thread overview]
Message-ID: <46E0436E.9030504@midwinter.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0709061354180.28586@racer.site>

Johannes Schindelin wrote:
> But you can add _yet another_ index to it, which can be generated on the 
> fly, so that Git only has to generate the information once, and then reuse 
> it later.  As a benefit of this method, the underlying well-tested 
> structure needs no change at all.
>   

And in fact, you can do this today, without modifying git-blame at all, 
by (ab)using its "-S" option (which lets you specify a custom ancestry 
chain to search). By coincidence, I was just showing some people at my 
office how to do this yesterday. I'll cut-and-paste from the email I 
sent them. I am not claiming this is nearly as desirable as a built-in, 
auto-updated secondary index, but it proves the concept, anyway.

Fast-to-generate version:

git-rev-list HEAD -- main.c | awk '{if (last) print last " " $0; 
last=$0;}' > /tmp/revlist

This speeds things up a lot, because git blame doesn't have to examine 
other revisions:

time git blame main.c
   1.56s user 0.30s system 99% cpu 1.868 total
time git blame -S /tmp/revlist main.c
   0.21s user 0.03s system 96% cpu 0.249 total

The bad news is that generating that revision list is a bit slow, and if 
you do it the naive way I suggested above, you can't use the rev list 
with the -M option (to follow renames). The good news is that it's 
possible to have that too if you generate a list of revisions that 
includes the renames:

# Generate a list of all revisions in the right order (only need to do 
this once, not once per file)
git rev-list HEAD > /tmp/all-revs
# Generate a list of the revisions that touched this file, following 
copies/renames.
# Could do this in fewer commands but this is hopefully easier to follow.
git blame --porcelain -M main.c | \
   egrep '^[0-9a-f]{40}' | \
   cut -d' ' -f1 | \
   fgrep -f - /tmp/all-revs | \
   awk '{if (last) print last " " $0; last=$0;}' > /tmp/revlist

Then -M is fast too:

time git blame -M main.c
   1.72s user 0.27s system 89% cpu 2.219 total
time git blame -M -S /tmp/revlist main.c
   0.29s user 0.03s system 93% cpu 0.341 total

Oddly, if you use the -S option, "git blame -C" actually gets 
significantly *slower*. I am not sure why.

-Steve

  reply	other threads:[~2007-09-06 18:14 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-04 15:23 Git's database structure Jon Smirl
2007-09-04 15:55 ` Andreas Ericsson
2007-09-04 16:07   ` Mike Hommey
2007-09-04 16:10     ` Andreas Ericsson
2007-09-04 16:19   ` Jon Smirl
2007-09-04 16:29     ` Andreas Ericsson
2007-09-04 17:09     ` Jeff King
2007-09-04 20:17     ` David Tweed
2007-09-04 17:21   ` Junio C Hamano
2007-09-04 16:28 ` Jon Smirl
2007-09-04 16:31   ` Andreas Ericsson
2007-09-04 16:47     ` Jon Smirl
2007-09-04 16:51       ` Andreas Ericsson
2007-09-04 17:25   ` Junio C Hamano
2007-09-04 17:44     ` Jon Smirl
2007-09-04 18:04       ` Mike Hommey
2007-09-04 19:44         ` Reece Dunn
2007-09-04 18:06       ` Junio C Hamano
2007-09-04 21:25       ` Theodore Tso
2007-09-04 21:54         ` Jon Smirl
2007-09-05  7:18           ` Andreas Ericsson
2007-09-05 13:41             ` Jon Smirl
2007-09-05 14:51               ` Andreas Ericsson
2007-09-05 15:37                 ` Jon Smirl
2007-09-05 15:54                   ` Julian Phillips
2007-09-05 16:12                     ` Jon Smirl
2007-09-05 17:31                       ` Julian Phillips
2007-09-06  1:27                         ` Kyle Moffett
2007-09-05 17:39                       ` Mike Hommey
2007-09-06  8:49                       ` Andreas Ericsson
2007-09-06  9:09                         ` Junio C Hamano
2007-09-06 11:03                           ` Wincent Colaiuta
2007-09-06 12:56                       ` Johannes Schindelin
2007-09-06 18:14                         ` Steven Grimm [this message]
2007-09-07  0:33                       ` Martin Langhoff
2007-09-05 19:52               ` Andy Parkins
2007-09-04 17:19 ` Julian Phillips
2007-09-04 17:30   ` Jon Smirl
2007-09-04 18:51     ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46E0436E.9030504@midwinter.com \
    --to=koreth@midwinter.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonsmirl@gmail.com \
    --cc=julian@quantumfyre.co.uk \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).