git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: Chris Wedgwood <cw@f00f.org>
Cc: Linus Torvalds <torvalds@osdl.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: I want to release a "git-1.0"
Date: Tue, 31 May 2005 19:11:49 -0700	[thread overview]
Message-ID: <7vu0kiu8pm.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <Pine.LNX.4.58.0505301801520.1876@ppc970.osdl.org> (Linus Torvalds's message of "Mon, 30 May 2005 18:06:25 -0700 (PDT)")

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Mon, 30 May 2005, Chris Wedgwood wrote:
>> 
>> I'm still at a loss how to do the equivalent of annotate.  I know a
>> couple of front ends can do this but I have no idea what command line
>> magic would be equivalent.

LT> There isn't any. It's actually pretty nasty to do, following history 
LT> backwards and keeping track of lines as they are added. I know how, I'm 
LT> just really lazy and hoping somebody else will do it, since I really end 
LT> up not caring that much myself.

LT> I notice that Thomas Gleixner seems to have one, but that one is based on 
LT> a database, and doesn't look usable as a standalone command..

Here is my quick-and-dirty one done in Perl.  This is dog-slow
and not suited for interactive use, but its algorithm should
handle the merges, renames and complete rewrites correctly.

Its sample output for:

    $ blame.perl HEAD git-commit-script

look like this (I've edited the SHA1 and names to make it a bit
shorter, but it still does not fit on my 80-column terminal X-<).

For each line in the version in the HEAD, it outputs (TAB
separated) the SHA1 of the commit that is responsible for the
line to be there, author, commiter, line number in the version
of the guity commit and the filename in the guilty commit (this
file could have been renamed in which case this may not match
the name of the file the script was originally asked to
annotate).  It shows that 9th line was what was in
a3e870f2... commit as 11th line done by Linus, for example.

:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	1	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	2	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	3	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	4	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	5	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	6	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	7	git-commit-script
:2036d841...	Junio C....@cox.net>	Linus T....osdl.org>	8	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	11	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	12	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	13	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	14	git-commit-script
:a3e870f2...	Linus T....osdl.org>	Linus T....osdl.org>	15	git-commit-script

------------
A blame script for use by higher-level annotate tools.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff -u a/blame.perl b/blame.perl
--- /dev/null
+++ b/blame.perl
@@ -0,0 +1,400 @@
+#!/usr/bin/perl -w
+
+use strict;
+
+package main;
+$::debug = 0;
+
+sub read_blob {
+    my $sha1 = shift;
+    my $fh = undef;
+    my $result;
+    local ($/) = undef;
+    open $fh, '-|', 'git-cat-file', 'blob', $sha1
+	or die "cannot read blob $sha1";
+    $result = join('', <$fh>);
+    close $fh
+	or die "failure while closing pipe to git-cat-file";
+    return $result;
+}
+
+sub read_diff_raw {
+    my ($parent, $filename) = @_;
+    my $fh = undef;
+    local ($/) = "\0";
+    my @result = (); 
+    my ($meta, $status, $sha1_1, $sha1_2, $file1, $file2);
+    print STDERR "* diff-cache $parent\n" if $::debug;
+    open $fh, '-|', 'git-diff-cache', '-B', '-C', '--cached', '-z', $parent
+	or die "cannot read git-diff-cache with $parent";
+    while (defined ($meta = <$fh>)) {
+	chomp($meta);
+	(undef, undef, $sha1_1, $sha1_2, $status) = split(/ /, $meta);
+	$file1 = <$fh>;
+	chomp($file1);
+	if ($status =~ /^[CR]/) {
+	    $file2 = <$fh>;
+	    chomp($file2);
+	} elsif ($status =~ /^D/) {
+	    next;
+	} else {
+	    $file2 = $file1;
+	}
+	if ($file2 eq $filename) {
+	    push @result, [$status, $sha1_1, $sha1_2, $file1, $file2];
+	}
+    }
+    close $fh
+	or die "failure while closing pipe to git-diff-cache";
+    return @result;
+}
+
+sub write_temp_blob {
+    my ($sha1, $temp) = @_;
+    my $fh = undef;
+    my $blob = read_blob($sha1);
+    open $fh, '>', $temp
+	or die "cannot open temporary file $temp";
+    print $fh $blob;
+    close($fh);
+}
+
+package Git::Patch;
+sub new {
+    my ($class, $sha1_1, $sha1_2) = @_;
+    my $self = bless [], $class;
+    my $fh = undef;
+    ::write_temp_blob($sha1_1, "/tmp/blame-$$-1");
+    ::write_temp_blob($sha1_2, "/tmp/blame-$$-2");
+    open $fh, '-|', 'diff', '-u0', "/tmp/blame-$$-1", "/tmp/blame-$$-2"
+	or die "cannot read diff";
+    while (<$fh>) {
+	if (/^\@\@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? \@\@/) {
+	    push @$self, [$1, (defined $2 ? $2 : 1),
+			  $3, (defined $4 ? $4 : 1)];
+	}
+    }
+    close $fh;
+    unlink "/tmp/blame-$$-1", "/tmp/blame-$$-2";
+    return $self;
+}
+
+sub find_parent_line {
+    my ($self, $commit_lineno) = @_;
+    my $ofs = 0;
+    for (@$self) {
+	my ($line_1, $len_1, $line_2, $len_2) = @$_;
+	if ($commit_lineno < $line_2) {
+	    return $commit_lineno - $ofs;
+	}
+	if ($line_2 <= $commit_lineno && $commit_lineno < $line_2 + $len_2) {
+	    return -1; # changed by commit.
+	}
+	$ofs += ($len_1 - $len_2);
+    }
+    return $commit_lineno + $ofs;
+}
+
+package Git::Commit;
+sub new {
+    my $class = shift;
+    my $self = bless {
+	PARENT => [],
+	TREE => undef,
+	AUTHOR => undef,
+	COMMITTER => undef,
+    }, $class;
+    my $commit_sha1 = shift;
+    $self->{SHA1} = $commit_sha1;
+    my $fh = undef;
+    open $fh, '-|', 'git-cat-file', 'commit', $commit_sha1
+	or die "cannot read commit object $commit_sha1";
+    while (<$fh>) {
+	chomp;
+	if (/^tree ([0-9a-f]{40})$/) { $self->{TREE} = $1; }
+	elsif (/^parent ([0-9a-f]{40})$/) { push @{$self->{PARENT}}, $1; }
+	elsif (/^author ([^>]+>)/) { $self->{AUTHOR} = $1; }
+	elsif (/^committer ([^>]+>)/) { $self->{COMMITTER} = $1; }
+    }
+    close $fh
+	or die "failure while closing pipe to git-cat-file";
+    return $self;
+}
+
+sub find_file {
+    my ($commit, $path) = @_;
+    my $result = undef;
+    my $fh = undef;
+    local ($/) = "\0";
+    open $fh, '-|', 'git-ls-tree', '-z', '-r', '-d', $commit->{TREE}, $path
+	or die "cannot read git-ls-tree $commit->{TREE}";
+    while (<$fh>) {
+	chomp;
+	if (/^[0-7]{6} blob ([0-9a-f]{40})	(.*)$/) {
+	    if ($2 ne $path) {
+		die "$2 ne $path???";
+	    }
+	    $result = $1;
+	    last;
+	}
+    }
+    close $fh
+	or die "failure while closing pipe to git-ls-tree";
+    return $result;
+}
+
+package Git::Blame;
+sub new {
+    my $class = shift;
+    my $self = bless {
+	LINE => [],
+	UNKNOWN => undef,
+	WORK => [],
+    }, $class;
+    my $commit = shift;
+    my $filename = shift;
+    my $sha1 = $commit->find_file($filename);
+    my $blob = ::read_blob($sha1);
+    my @blob = (split(/\n/, $blob));
+    for (my $i = 0; $i < @blob; $i++) {
+	$self->{LINE}[$i] = +{
+	    COMMIT => $commit,
+	    FOUND => undef,
+	    FILENAME => $filename,
+	    LINENO => ($i + 1),
+	};
+    }
+    $self->{UNKNOWN} = scalar @blob;
+    push @{$self->{WORK}}, [$commit, $filename];
+    return $self;
+}
+
+sub print {
+    my $self = shift;
+    my $line_termination = shift;
+    for (my $i = 0; $i < @{$self->{LINE}}; $i++) {
+	my $l = $self->{LINE}[$i];
+	print ($l->{FOUND} ? ':' : '?');;
+	print "$l->{COMMIT}->{SHA1}	";
+	print "$l->{COMMIT}->{AUTHOR}	";
+	print "$l->{COMMIT}->{COMMITTER}	";
+	print "$l->{LINENO}	$l->{FILENAME}";
+	print $line_termination;
+    }
+}
+
+sub take_responsibility {
+    my ($self, $commit) = @_;
+    for (my $i = 0; $i < @{$self->{LINE}}; $i++) {
+	my $l = $self->{LINE}[$i];
+	if (! $l->{FOUND} && ($l->{COMMIT}->{SHA1} eq $commit->{SHA1})) {
+	    $l->{FOUND} = 1;
+	    $self->{UNKNOWN}--;
+	}
+    }
+}
+
+sub blame_parent {
+    my ($self, $commit, $parent, $filename) = @_;
+    my @diff = ::read_diff_raw($parent->{SHA1}, $filename);
+    my $filename_in_parent;
+    my $passed_blame_to_parent = undef;
+    if (@diff == 0) {
+	# We have not touched anything.  Blame parent for everything
+	# that we are suspected for.
+	for (my $i = 0; $i < @{$self->{LINE}}; $i++) {
+	    my $l = $self->{LINE}[$i];
+	    if (! $l->{FOUND} && ($l->{COMMIT}->{SHA1} eq $commit->{SHA1})) {
+		$l->{COMMIT} = $parent;
+		$passed_blame_to_parent = 1;
+	    }
+	}
+	$filename_in_parent = $filename;
+    }
+    elsif (@diff != 1) {
+	# This should not happen.
+	for (@diff) {
+	    print "** @$_\n";
+	}
+	die "Oops";
+    }
+    else {
+	my ($status, $sha1_1, $sha1_2, $file1, $file2) = @{$diff[0]};
+	print STDERR "** $status $file1 $file2\n" if $::debug;
+	if ($status =~ /N/) {
+	    # Either some of other parents created it, or we did.
+	    # At this point the only thing we know is that this
+	    # parent is not responsible for it.
+	    ;
+	}
+	else {
+	    my $patch = Git::Patch->new($sha1_1, $sha1_2);
+	    $filename_in_parent = $file1;
+	    for (my $i = 0; $i < @{$self->{LINE}}; $i++) {
+		my $l = $self->{LINE}[$i];
+		if (! $l->{FOUND} && $l->{COMMIT}->{SHA1} eq $commit->{SHA1}) {
+		    # We are suspected to have introduced this line.
+		    # Does it exist in the parent?
+		    my $lineno = $l->{LINENO};
+		    my $parent_line = $patch->find_parent_line($lineno);
+		    if ($parent_line < 0) {
+			# No, we may be the guilty ones, or some other
+			# parent might be.  We do not assign blame to
+			# ourselves here yet.
+			;
+		    }
+		    else {
+			# This line is coming from the parent, so pass
+			# blame to it.
+			$l->{COMMIT} = $parent;
+			$l->{FILENAME} = $file1;
+			$l->{LINENO} = $parent_line;
+			$passed_blame_to_parent = 1;
+		    }
+		}
+	    }
+	}
+    }
+    if ($passed_blame_to_parent && $self->{UNKNOWN}) {
+	unshift @{$self->{WORK}},
+	[$parent, $filename_in_parent];
+    }
+}
+
+sub assign {
+    my ($self, $commit, $filename) = @_;
+    # We do read-tree of the current commit and diff-cache
+    # with each parents, instead of running diff-tree.  This
+    # is because diff-tree does not look for copies hard enough.
+    #
+    print STDERR "* read-tree  $commit->{SHA1}\n" if $::debug;
+    system('git-read-tree', '-m', $commit->{SHA1});
+    for my $parent (@{$commit->{PARENT}}) {
+	$self->blame_parent($commit, Git::Commit->new($parent), $filename);
+    }
+    $self->take_responsibility($commit);
+}
+
+sub assign_blame {
+    my ($self) = @_;
+    while ($self->{UNKNOWN} && @{$self->{WORK}}) {
+	my $wk = shift @{$self->{WORK}};
+	my ($commit, $filename) = @$wk;
+	$self->assign($commit, $filename);
+    }
+}
+
+
+
+################################################################
+package main;
+my $usage = "blame [-z] <commit> filename";
+my $line_termination = "\n";
+
+$::ENV{GIT_INDEX_FILE} = "/tmp/blame-$$-index";
+unlink($::ENV{GIT_INDEX_FILE});
+
+if ($ARGV[0] eq '-z') {
+    $line_termination = "\0";
+    shift;
+}
+
+if (@ARGV != 2) {
+    die $usage;
+}
+
+my $head_commit = Git::Commit->new($ARGV[0]);
+my $filename = $ARGV[1];
+my $blame = Git::Blame->new($head_commit, $filename);
+
+$blame->assign_blame();
+$blame->print($line_termination);
+
+unlink($::ENV{GIT_INDEX_FILE});
+
+__END__
+
+How does this work, and what do we do about merges?
+
+The algorithm considers that the first parent is our main line of
+development and treats it somewhat special than other parents.  So we
+pass on the blame to the first parent if a line has not changed from
+it.  For lines that have changed from the first parent, we must have
+either inherited that change from some other parent, or it could have
+been merge conflict resolution edit we did on our own.
+
+The following picture illustrates how we pass on and assign blames.
+
+In the sample, the original O was forked into A and B and then merged
+into M.  Line 1, 2, and 4 did not change.  Line 3 and 5 are changed in
+A, and Line 5 and 6 are changed in B.  M made its own decision to
+resolve merge conflicts at Line 5 to something different from A and B:
+
+                A: 1 2 T 4 T 6
+               /               \ 
+O: 1 2 3 4 5 6                  M: 1 2 T 4 M S
+               \               / 
+                B: 1 2 3 4 S S
+
+In the following picture, each line is annotated with a blame letter.
+A lowercase blame (e.g. "a" for "1") means that commit or its ancestor
+is the guilty party but we do not know which particular ancestor is
+responsible for the change yet.  An uppercase blame means that we know
+that commit is the guilty party.
+
+First we look at M (the HEAD) and initialize Git::Blame->{LINE} like
+this:
+
+             M: 1 2 T 4 M S
+                m m m m m m
+
+That is, we know all lines are results of modification made by some
+ancestor of M, so we assign lowercase 'm' to all of them.
+
+Then we examine our first parent A.  Throughout the algorithm, we are
+always only interested in the lines we are the suspect, but this being
+the initial round, we are the suspect for all of them.  We notice that
+1 2 T 4 are the same as the parent A, so we pass the blame for these
+four lines to A.  M and S are different from A, so we leave them as
+they are (note that we do not immediately take the blame for them):
+
+             M: 1 2 T 4 M S
+                a a a a m m
+
+Next we go on to examine parent B.  Again, we are only interested in
+the lines we are still the suspect (i.e. M and S).  We notice S is
+something we inherited from B, so we pass the blame on to it, like
+this:
+
+             M: 1 2 T 4 M S
+                a a a a m b
+
+Once we exhausted the parents, we look at the results and take
+responsibility for the remaining ones that we are still the suspect:
+
+             M: 1 2 T 4 M S
+                a a a a M b
+
+We are done with M.  And we know commits A and B need to be examined
+further, so we do them recursively.  When we look at A, we again only
+look at the lines that A is the suspect:
+
+             A: 1 2 T 4 T 6
+                a a a a M b
+
+Among 1 2 T 4, comparing against its parent O, we notice 1 2 4 are
+the same so pass the blame for those lines to O:
+
+             A: 1 2 T 4 T 6
+                o o a o M b
+
+A is a non-merge commit; we have already exhausted the parents and
+take responsibility for the remaining ones that A is the suspect:
+
+             A: 1 2 T 4 T 6
+                o o A o M b
+
+We go on like this and the final result would become:
+
+             O: 1 2 3 4 5 6
+                O O A O M B


  reply	other threads:[~2005-06-01  2:11 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-30 20:00 I want to release a "git-1.0" Linus Torvalds
2005-05-30 20:33 ` jeff millar
2005-05-30 20:49 ` Nicolas Pitre
2005-06-01  6:52   ` Junio C Hamano
2005-06-01  8:24     ` [PATCH] Add -d flag to git-pull-* family Junio C Hamano
2005-06-01 14:39       ` Nicolas Pitre
2005-06-01 16:00         ` Junio C Hamano
     [not found]           ` <7v1x7lk8fl.fsf_-_@assigned-by-dhcp.cox.net>
2005-06-02  0:47             ` [PATCH] Handle deltified object correctly in git-*-pull family Nicolas Pitre
     [not found]             ` <7vpsv5hbm5.fsf@assigned-by-dhcp.cox.net>
2005-06-02  0:51               ` [PATCH] Stop inflating the whole SHA1 file only to check size Nicolas Pitre
2005-06-02  1:32                 ` Junio C Hamano
2005-06-02  0:58             ` [PATCH] Handle deltified object correctly in git-*-pull family Linus Torvalds
2005-06-02  1:43               ` Junio C Hamano
2005-05-30 20:59 ` I want to release a "git-1.0" Junio C Hamano
2005-05-30 21:07 ` Junio C Hamano
2005-05-30 22:11 ` David Greaves
2005-05-30 22:12 ` Dave Jones
2005-05-30 22:55   ` Dmitry Torokhov
2005-05-30 23:15     ` Junio C Hamano
2005-05-30 23:23     ` Dmitry Torokhov
2005-05-31  0:52   ` Linus Torvalds
2005-05-30 22:19 ` Ryan Anderson
2005-05-31  0:58   ` Linus Torvalds
2005-05-30 22:32 ` Chris Wedgwood
2005-05-30 23:56   ` Chris Wedgwood
2005-05-31  1:06   ` Linus Torvalds
2005-06-01  2:11     ` Junio C Hamano [this message]
2005-06-01  2:25       ` David Lang
2005-06-01  4:53         ` Junio C Hamano
2005-06-01 20:06           ` David Lang
2005-06-01 20:16             ` C. Scott Ananian
2005-06-02  0:43               ` Nicolas Pitre
2005-06-02  1:14                 ` Brian O'Mahoney
2005-06-01 23:03             ` Junio C Hamano
2005-05-31  0:19 ` Petr Baudis
2005-05-31 13:45 ` Eric W. Biederman
2005-06-01  3:04   ` Linus Torvalds
2005-06-01  4:06     ` Junio C Hamano
2005-06-02 23:54       ` [PATCH] Fix -B "very-different" logic Junio C Hamano
2005-06-03  0:21         ` Linus Torvalds
2005-06-03  1:33           ` Junio C Hamano
2005-06-03  8:32             ` [PATCH 0/4] " Junio C Hamano
2005-06-03  8:36               ` [PATCH 1/4] Tweak count-delta interface Junio C Hamano
2005-06-03  8:36               ` [PATCH 2/4] diff: Fix docs and add -O to diff-helper Junio C Hamano
2005-06-03  8:37               ` [PATCH 3/4] diff: Clean up diff_scoreopt_parse() Junio C Hamano
2005-06-03  8:40               ` [PATCH 4/4] diff: Update -B heuristics Junio C Hamano
2005-06-01  6:28     ` I want to release a "git-1.0" Junio C Hamano
2005-06-01 22:00     ` Daniel Barkalow
2005-06-01 23:05       ` Junio C Hamano
2005-06-03  9:47       ` Petr Baudis
2005-06-03 15:09         ` Daniel Barkalow
2005-06-02  7:15     ` Eric W. Biederman
2005-06-02  8:32       ` Kay Sievers
2005-06-02 14:52       ` Linus Torvalds
2005-06-02 12:02     ` [PATCH] several typos in tutorial Alexey Nezhdanov
2005-06-02 12:41       ` Vincent Hanquez
2005-06-02 12:45         ` Alexey Nezhdanov
2005-06-02 12:51           ` Vincent Hanquez
2005-06-02 12:56             ` Alexey Nezhdanov
2005-06-02 13:00             ` Alexey Nezhdanov
2005-06-02 23:40     ` I want to release a "git-1.0" Adam Kropelin
2005-06-03  0:06       ` Linus Torvalds
2005-06-03  0:47         ` Linus Torvalds
2005-06-03  1:34           ` Adam Kropelin
2005-06-02 19:43 ` CVS migration section to the tutorial Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vu0kiu8pm.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=cw@f00f.org \
    --cc=git@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).