From: Steven Grimm <koreth@midwinter.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Jon Smirl <jonsmirl@gmail.com>,
Julian Phillips <julian@quantumfyre.co.uk>,
Andreas Ericsson <ae@op5.se>, Theodore Tso <tytso@mit.edu>,
Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Git's database structure
Date: Thu, 06 Sep 2007 11:14:06 -0700 [thread overview]
Message-ID: <46E0436E.9030504@midwinter.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0709061354180.28586@racer.site>
Johannes Schindelin wrote:
> But you can add _yet another_ index to it, which can be generated on the
> fly, so that Git only has to generate the information once, and then reuse
> it later. As a benefit of this method, the underlying well-tested
> structure needs no change at all.
>
And in fact, you can do this today, without modifying git-blame at all,
by (ab)using its "-S" option (which lets you specify a custom ancestry
chain to search). By coincidence, I was just showing some people at my
office how to do this yesterday. I'll cut-and-paste from the email I
sent them. I am not claiming this is nearly as desirable as a built-in,
auto-updated secondary index, but it proves the concept, anyway.
Fast-to-generate version:
git-rev-list HEAD -- main.c | awk '{if (last) print last " " $0;
last=$0;}' > /tmp/revlist
This speeds things up a lot, because git blame doesn't have to examine
other revisions:
time git blame main.c
1.56s user 0.30s system 99% cpu 1.868 total
time git blame -S /tmp/revlist main.c
0.21s user 0.03s system 96% cpu 0.249 total
The bad news is that generating that revision list is a bit slow, and if
you do it the naive way I suggested above, you can't use the rev list
with the -M option (to follow renames). The good news is that it's
possible to have that too if you generate a list of revisions that
includes the renames:
# Generate a list of all revisions in the right order (only need to do
this once, not once per file)
git rev-list HEAD > /tmp/all-revs
# Generate a list of the revisions that touched this file, following
copies/renames.
# Could do this in fewer commands but this is hopefully easier to follow.
git blame --porcelain -M main.c | \
egrep '^[0-9a-f]{40}' | \
cut -d' ' -f1 | \
fgrep -f - /tmp/all-revs | \
awk '{if (last) print last " " $0; last=$0;}' > /tmp/revlist
Then -M is fast too:
time git blame -M main.c
1.72s user 0.27s system 89% cpu 2.219 total
time git blame -M -S /tmp/revlist main.c
0.29s user 0.03s system 93% cpu 0.341 total
Oddly, if you use the -S option, "git blame -C" actually gets
significantly *slower*. I am not sure why.
-Steve
next prev parent reply other threads:[~2007-09-06 18:14 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-04 15:23 Git's database structure Jon Smirl
2007-09-04 15:55 ` Andreas Ericsson
2007-09-04 16:07 ` Mike Hommey
2007-09-04 16:10 ` Andreas Ericsson
2007-09-04 16:19 ` Jon Smirl
2007-09-04 16:29 ` Andreas Ericsson
2007-09-04 17:09 ` Jeff King
2007-09-04 20:17 ` David Tweed
2007-09-04 17:21 ` Junio C Hamano
2007-09-04 16:28 ` Jon Smirl
2007-09-04 16:31 ` Andreas Ericsson
2007-09-04 16:47 ` Jon Smirl
2007-09-04 16:51 ` Andreas Ericsson
2007-09-04 17:25 ` Junio C Hamano
2007-09-04 17:44 ` Jon Smirl
2007-09-04 18:04 ` Mike Hommey
2007-09-04 19:44 ` Reece Dunn
2007-09-04 18:06 ` Junio C Hamano
2007-09-04 21:25 ` Theodore Tso
2007-09-04 21:54 ` Jon Smirl
2007-09-05 7:18 ` Andreas Ericsson
2007-09-05 13:41 ` Jon Smirl
2007-09-05 14:51 ` Andreas Ericsson
2007-09-05 15:37 ` Jon Smirl
2007-09-05 15:54 ` Julian Phillips
2007-09-05 16:12 ` Jon Smirl
2007-09-05 17:31 ` Julian Phillips
2007-09-06 1:27 ` Kyle Moffett
2007-09-05 17:39 ` Mike Hommey
2007-09-06 8:49 ` Andreas Ericsson
2007-09-06 9:09 ` Junio C Hamano
2007-09-06 11:03 ` Wincent Colaiuta
2007-09-06 12:56 ` Johannes Schindelin
2007-09-06 18:14 ` Steven Grimm [this message]
2007-09-07 0:33 ` Martin Langhoff
2007-09-05 19:52 ` Andy Parkins
2007-09-04 17:19 ` Julian Phillips
2007-09-04 17:30 ` Jon Smirl
2007-09-04 18:51 ` Andreas Ericsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46E0436E.9030504@midwinter.com \
--to=koreth@midwinter.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=ae@op5.se \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonsmirl@gmail.com \
--cc=julian@quantumfyre.co.uk \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).