* [PATCH/RFC 0/4] Perl rewrite of Ruby git-related @ 2013-06-30 11:08 Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 1/4] contrib: add git-contacts helper Eric Sunshine ` (4 more replies) 0 siblings, 5 replies; 15+ messages in thread From: Eric Sunshine @ 2013-06-30 11:08 UTC (permalink / raw) To: git; +Cc: Eric Sunshine This is a Perl rewrite of Felipe Contreras' git-related v9 patch series[1] which was written in Ruby. Although that series was ejected from 'pu'[2], Junio suggested[3,4] that such functionality may be a useful addition to the official tool-chest, hence this Perl rewrite. In this submission, the command name has changed to git-contacts since git-related felt too generic. (git-contacts seemed best of several possibilities I surveyed: git-people, git-interested, git-mentioned, git-blame-us.) This rewrite does not maintain perfect 1-to-1 parity with Felipe's v9 series, however, it is close: minor refactoring was done to eliminate a small amount of duplicate code; patch files and revision arguments are allowed in the same invocation rather than being exclusive; "git cat-file --batch" pipe deadlock is avoided; commit messages are expanded. No attempt is made to answer Junio's v9 review[5], as I lack sufficient insight with '-C' options to be able to respond properly. My Perl may be rusty and idiomatic usage may be absent. [1]: http://thread.gmane.org/gmane.comp.version-control.git/226065/ [2]: http://article.gmane.org/gmane.comp.version-control.git/229164/ [3]: http://article.gmane.org/gmane.comp.version-control.git/226425/ [4]: http://thread.gmane.org/gmane.comp.version-control.git/221728/focus=221796 [5]: http://article.gmane.org/gmane.comp.version-control.git/226265/ Eric Sunshine (4): contrib: add git-contacts helper contrib: contacts: add support for multiple patches contrib: contacts: add ability to parse from committish contrib: contacts: interpret committish akin to format-patch contrib/contacts/git-contacts | 164 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 164 insertions(+) create mode 100755 contrib/contacts/git-contacts -- 1.8.3.2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH/RFC 1/4] contrib: add git-contacts helper 2013-06-30 11:08 [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Eric Sunshine @ 2013-06-30 11:08 ` Eric Sunshine 2013-07-01 18:39 ` Junio C Hamano 2013-06-30 11:08 ` [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches Eric Sunshine ` (3 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Eric Sunshine @ 2013-06-30 11:08 UTC (permalink / raw) To: git; +Cc: Eric Sunshine This script lists people that might be interested in a patch by going back through the history for each patch hunk, and finding people that reviewed, acknowledge, signed, or authored the code the patch is modifying. It does this by running git-blame incrementally on each hunk and then parsing the commit message. After gathering all participants, it determines each person's relevance by considering how many commits mentioned that person compared with the total number of commits under consideration. The final output consists only of participants who pass a minimum threshold of participation. For example: % git contacts 0001-remote-hg-trivial-cleanups.patch Felipe Contreras <felipe.contreras@gmail.com> Jeff King <peff@peff.net> Max Horn <max@quendi.de> Junio C Hamano <gitster@pobox.com> Thus, it can be invoked as git-send-email's --cc-cmd option, among other possible uses. This is a Perl rewrite of Felipe Contreras' git-related patch series[1] written in Ruby. [1]: http://thread.gmane.org/gmane.comp.version-control.git/226065/ Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> --- To better support Windows, a follow-up patch may want to add functionality similar to run_cmd_pipe() from git-add--interactive.perl. contrib/contacts/git-contacts | 121 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100755 contrib/contacts/git-contacts diff --git a/contrib/contacts/git-contacts b/contrib/contacts/git-contacts new file mode 100755 index 0000000..9007bae --- /dev/null +++ b/contrib/contacts/git-contacts @@ -0,0 +1,121 @@ +#!/usr/bin/perl + +# List people who might be interested in a patch. Useful as the argument to +# git-send-email --cc-cmd option, and in other situations. +# +# Usage: git contacts <file> + +use strict; +use warnings; +use IPC::Open2; + +my $since = '5-years-ago'; +my $min_percent = 10; +my $labels_rx = qr/(?:Signed-off|Reviewed|Acked)-by/; +my $id_rx = qr/[0-9a-f]{40}/i; + +sub format_contact { + my ($name, $email) = @_; + return "$name <$email>"; +} + +sub parse_commit { + my ($commit, $data) = @_; + my $contacts = $commit->{contacts}; + my $inbody = 0; + for (split(/^/m, $data)) { + if (not $inbody) { + if (/^author ([^<>]+) <(\S+)> .+$/) { + $contacts->{format_contact($1, $2)} = 1; + } elsif (/^$/) { + $inbody = 1; + } + } elsif (/^$labels_rx:\s+([^<>]+)\s+<(\S+?)>$/o) { + $contacts->{format_contact($1, $2)} = 1; + } + } +} + +sub import_commits { + my ($commits) = @_; + return unless %$commits; + my $pid = open2 my $reader, my $writer, qw(git cat-file --batch); + for my $id (keys(%$commits)) { + print $writer "$id\n"; + my $line = <$reader>; + if ($line =~ /^($id_rx) commit (\d+)/o) { + my ($cid, $len) = ($1, $2); + die "expected $id but got $cid" unless $id eq $cid; + my $data; + # cat-file emits newline after data, so read len+1 + read $reader, $data, $len + 1; + parse_commit($commits->{$id}, $data); + } + } + close $reader; + close $writer; + waitpid($pid, 0); + die "git-cat-file error: $?" if $?; +} + +sub get_blame { + my ($commits, $source, $start, $len, $from) = @_; + $len = 1 unless defined($len); + return if $len == 0; + open my $f, '-|', + qw(git blame --incremental -C -C), '-L', "$start,+$len", + '--since', $since, "$from^", '--', $source or die; + while (<$f>) { + if (/^$id_rx/o) { + my $id = $&; + $commits->{$id} = { id => $id, contacts => {} }; + } + } + close $f; +} + +sub scan_hunks { + my ($commits, $id, $f) = @_; + my $source; + while (<$f>) { + if (/^---\s+(\S+)/) { + $source = substr($1, 2) unless $1 eq '/dev/null'; + } elsif (/^@@ -(\d+)(?:,(\d+))?/ && $source) { + get_blame($commits, $source, $1, $2, $id); + } + } +} + +sub commits_from_patch { + my ($commits, $file) = @_; + open my $f, '<', $file or die "read failure: $file: $!"; + my $id; + while (<$f>) { + if (/^From ($id_rx) /o) { + $id = $1; + last; + } + } + scan_hunks($commits, $id, $f) if $id; + close $f; +} + +exit 1 unless @ARGV == 1; + +my %commits; +commits_from_patch(\%commits, $ARGV[0]); +import_commits(\%commits); + +my %count_per_person; +for my $commit (values %commits) { + for my $contact (keys %{$commit->{contacts}}) { + $count_per_person{$contact}++; + } +} + +my $ncommits = scalar(keys %commits); +for my $contact (keys %count_per_person) { + my $percent = $count_per_person{$contact} * 100 / $ncommits; + next if $percent < $min_percent; + print "$contact\n"; +} -- 1.8.3.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 1/4] contrib: add git-contacts helper 2013-06-30 11:08 ` [PATCH/RFC 1/4] contrib: add git-contacts helper Eric Sunshine @ 2013-07-01 18:39 ` Junio C Hamano 2013-07-01 19:17 ` Junio C Hamano 2013-07-02 8:17 ` Eric Sunshine 0 siblings, 2 replies; 15+ messages in thread From: Junio C Hamano @ 2013-07-01 18:39 UTC (permalink / raw) To: Eric Sunshine; +Cc: git Eric Sunshine <sunshine@sunshineco.com> writes: > diff --git a/contrib/contacts/git-contacts b/contrib/contacts/git-contacts > new file mode 100755 > index 0000000..9007bae > --- /dev/null > +++ b/contrib/contacts/git-contacts > @@ -0,0 +1,121 @@ > +#!/usr/bin/perl > + > +# List people who might be interested in a patch. Useful as the argument to > +# git-send-email --cc-cmd option, and in other situations. > +# > +# Usage: git contacts <file> > + > +use strict; > +use warnings; > +use IPC::Open2; > + > +my $since = '5-years-ago'; > +my $min_percent = 10; > +my $labels_rx = qr/(?:Signed-off|Reviewed|Acked)-by/; Although I personally do not see particuarly a good reason to do so, I have seen people add "Cc:" on the footers, and I suspect they may expect them to be also picked up as "relevant parties" to the change. Also S-o-b is often misspelled as "Signed-Off-By", so you may want to do qr//i this one. > +my $id_rx = qr/[0-9a-f]{40}/i; On the other hand, we always mark the "this is a format-patch output" marker lines with lowercase hex, so you would want to lose 'i' from here. And you probably want to tighten it even more, perhaps like so: qr/^From ([0-9a-f]{40}) Mon Sep 17 00:00:00 2001$/ Of course, you wuold need to have a separate regular expression to parse "git blame --incremental/--porcelain" output that may read perhaps like so: qr/^([0-9a-f]{40})[ *](\d) (\d) (\d)$/ to pick up only the group header. The last \d is the number of lines that came from this guilty party, and it might become useful if we want to give weight to people based on line-count, not just number of commits. > +sub format_contact { > + my ($name, $email) = @_; > + return "$name <$email>"; > +} > + > +sub parse_commit { > + my ($commit, $data) = @_; > + my $contacts = $commit->{contacts}; > + my $inbody = 0; > + for (split(/^/m, $data)) { > + if (not $inbody) { > + if (/^author ([^<>]+) <(\S+)> .+$/) { > + $contacts->{format_contact($1, $2)} = 1; The author name and email can be grabbed from the "blame" output without doing this (and the result may be more robust), but you would need to read from the log message anyway, so I think this is OK. Note that the names and emails in blame output are sanitized via the mailmap mechanism, but "cat-file commit" will certainly not be. You may have to do the mapping yourself by reading the mailmap for the names and addresses you read from S-o-b: and friends. > + } elsif (/^$/) { > + $inbody = 1; > + } > + } elsif (/^$labels_rx:\s+([^<>]+)\s+<(\S+?)>$/o) { > + $contacts->{format_contact($1, $2)} = 1; > + } > + } > +} > + > +sub import_commits { > + my ($commits) = @_; > + return unless %$commits; > + my $pid = open2 my $reader, my $writer, qw(git cat-file --batch); Hmph. I vaguely recall that people wanted not to use open2/IPC::Open2 in other parts of the system. I think "cat-file --batch" is designed to behave on a regular bidi pipe, as long as the caller strictly does a ping-pong of issuing one request, flush (with an empty line) and always read the response, so if open2 becomes problematic, we could switch to regular pipes. > + for my $id (keys(%$commits)) { > + print $writer "$id\n"; > + my $line = <$reader>; > + if ($line =~ /^($id_rx) commit (\d+)/o) { > + my ($cid, $len) = ($1, $2); > + die "expected $id but got $cid" unless $id eq $cid; > + my $data; > + # cat-file emits newline after data, so read len+1 > + read $reader, $data, $len + 1; > + parse_commit($commits->{$id}, $data); > + } > + } > + close $reader; > + close $writer; > + waitpid($pid, 0); > + die "git-cat-file error: $?" if $?; > +} > + > +sub get_blame { > + my ($commits, $source, $start, $len, $from) = @_; > + $len = 1 unless defined($len); > + return if $len == 0; > + open my $f, '-|', > + qw(git blame --incremental -C -C), '-L', "$start,+$len", > + '--since', $since, "$from^", '--', $source or die; "--incremental" is meant for consumers that wants better latency, not necessarily better throughput, but I think this consumer does not present incremental result to the end user, so perhaps you would want to use "--porcelain" instead. > + while (<$f>) { > + if (/^$id_rx/o) { > + my $id = $&; > + $commits->{$id} = { id => $id, contacts => {} }; > + } > + } > + close $f; > +} > + > +sub scan_hunks { > + my ($commits, $id, $f) = @_; > + my $source; > + while (<$f>) { > + if (/^---\s+(\S+)/) { I wonder what happens to a patch that touches a file with SP in its path with this regular expression. As it is fairly clear that you are reading from format-patch output (the caller does not call this function if it did not see $id), perhaps you can tighten the prefix a bit more? I.e. something like: if (/^--- (.*)$/) > + $source = substr($1, 2) unless $1 eq '/dev/null'; (mental note) A creation patch is special-cased, but this does not allow anything that does not begin with a single-byte --src-prefix (e.g. "a/path/to/patched/file"). > + } elsif (/^@@ -(\d+)(?:,(\d+))?/ && $source) { (mental note) For each hunk of a patch that is not a creation patch, find the origin of the preimage. > + get_blame($commits, $source, $1, $2, $id); > + } An major issue (*) and a few minor issues (-) from the above observations: * A single patch may touch two or more paths. If the first one is to modify an existing file, its path is assigned to $source. Now, imagine that the second one is a creation patch. $source is not set to undef but is kept, and the code ends up trying to run blame on the first path with the range for the second path. Oops? This is one of the reasons why we shouldn't use statement modifiers lightly. I think the above should be more like: if (/^--- (.*)$) { $source = ($1 eq '/dev/null') ? undef : substr($1, 2); } elsif ... - If the patch were prepared with a non-standard src/dst-prefix, unconditional substr($1, 2) would call blame on a wrong (and likely to be nonexistent) path without a useful diagnosis (the invocation of "git blame" will likely die with "no such path 'tailofpathname' in $id"). One way to make it more robust may be to do something like this: if (/^--- /) { if (m{^--- (?:a/(.*)|/dev/null)$}) { $source = ($1 eq '/dev/null') ? undef : $1; } else { die "Cannot parse the patch file:$_"; } } elsif ... - Often a single patch touches more than one ranges in the same path. Depending on the size of the patch, it might be more efficient to run a single blame for a range that covers all the lines the patch touches while discarding irrelevant parts. A longer term improvement may be to extend "git blame" so that it can take more than one "-L n,m" ranges, but I think that is outside of the scope of this patch. > + } > +} > + > +sub commits_from_patch { > + my ($commits, $file) = @_; > + open my $f, '<', $file or die "read failure: $file: $!"; > + my $id; > + while (<$f>) { > + if (/^From ($id_rx) /o) { > + $id = $1; > + last; > + } > + } > + scan_hunks($commits, $id, $f) if $id; > + close $f; > +} > + > +exit 1 unless @ARGV == 1; > + > +my %commits; > +commits_from_patch(\%commits, $ARGV[0]); > +import_commits(\%commits); > + > +my %count_per_person; > +for my $commit (values %commits) { > + for my $contact (keys %{$commit->{contacts}}) { > + $count_per_person{$contact}++; > + } > +} > + > +my $ncommits = scalar(keys %commits); > +for my $contact (keys %count_per_person) { > + my $percent = $count_per_person{$contact} * 100 / $ncommits; > + next if $percent < $min_percent; > + print "$contact\n"; > +} ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 1/4] contrib: add git-contacts helper 2013-07-01 18:39 ` Junio C Hamano @ 2013-07-01 19:17 ` Junio C Hamano 2013-07-02 8:17 ` Eric Sunshine 1 sibling, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2013-07-01 19:17 UTC (permalink / raw) To: Eric Sunshine; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > - If the patch were prepared with a non-standard src/dst-prefix, > unconditional substr($1, 2) would call blame on a wrong (and > likely to be nonexistent) path without a useful diagnosis (the > invocation of "git blame" will likely die with "no such path > 'tailofpathname' in $id"). > > One way to make it more robust may be to do something like this: > > if (/^--- /) { > if (m{^--- (?:a/(.*)|/dev/null)$}) { > $source = ($1 eq '/dev/null') ? undef : $1; Typo/thinko. I did that (?:(foo)|bar) thing so that I do not have to do the conditional. The above can just be if (m{^--- (?:a/(.*)|/dev/null)$}) { $source = $1; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 1/4] contrib: add git-contacts helper 2013-07-01 18:39 ` Junio C Hamano 2013-07-01 19:17 ` Junio C Hamano @ 2013-07-02 8:17 ` Eric Sunshine 2013-07-02 18:32 ` Junio C Hamano 1 sibling, 1 reply; 15+ messages in thread From: Eric Sunshine @ 2013-07-02 8:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List On Mon, Jul 1, 2013 at 2:39 PM, Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > >> diff --git a/contrib/contacts/git-contacts b/contrib/contacts/git-contacts >> new file mode 100755 >> index 0000000..9007bae >> --- /dev/null >> +++ b/contrib/contacts/git-contacts >> @@ -0,0 +1,121 @@ >> +#!/usr/bin/perl >> + >> +# List people who might be interested in a patch. Useful as the argument to >> +# git-send-email --cc-cmd option, and in other situations. >> +# >> +# Usage: git contacts <file> >> + >> +use strict; >> +use warnings; >> +use IPC::Open2; >> + >> +my $since = '5-years-ago'; >> +my $min_percent = 10; >> +my $labels_rx = qr/(?:Signed-off|Reviewed|Acked)-by/; > > Although I personally do not see particuarly a good reason to do so, > I have seen people add "Cc:" on the footers, and I suspect they may > expect them to be also picked up as "relevant parties" to the > change. Also S-o-b is often misspelled as "Signed-Off-By", so you > may want to do qr//i this one. I originally used /i but bogusly discarded it due to too narrow thinking. Will re-add. Agreed about adding Cc:. >> +my $id_rx = qr/[0-9a-f]{40}/i; > > On the other hand, we always mark the "this is a format-patch > output" marker lines with lowercase hex, so you would want to lose > 'i' from here. Indeed, all the SHA-1's matched by $id_rx in the script are git-generated and thus lowercase hex. Will change. > And you probably want to tighten it even more, perhaps > like so: > > qr/^From ([0-9a-f]{40}) Mon Sep 17 00:00:00 2001$/ > > Of course, you wuold need to have a separate regular expression to > parse "git blame --incremental/--porcelain" output that may read > perhaps like so: > > qr/^([0-9a-f]{40})[ *](\d) (\d) (\d)$/ > > to pick up only the group header. The last \d is the number of > lines that came from this guilty party, and it might become useful > if we want to give weight to people based on line-count, not just > number of commits. Can do. >> +sub format_contact { >> + my ($name, $email) = @_; >> + return "$name <$email>"; >> +} >> + >> +sub parse_commit { >> + my ($commit, $data) = @_; >> + my $contacts = $commit->{contacts}; >> + my $inbody = 0; >> + for (split(/^/m, $data)) { >> + if (not $inbody) { >> + if (/^author ([^<>]+) <(\S+)> .+$/) { >> + $contacts->{format_contact($1, $2)} = 1; > > The author name and email can be grabbed from the "blame" output > without doing this (and the result may be more robust), but you > would need to read from the log message anyway, so I think this is > OK. > > Note that the names and emails in blame output are sanitized via the > mailmap mechanism, but "cat-file commit" will certainly not be. Thanks for pointing this out. Grabbing the author name and email from git-blame output does seem like a better approach. > You may have to do the mapping yourself by reading the mailmap for > the names and addresses you read from S-o-b: and friends. Felipe did introduce mailmap support in v4 but dropped it at about v6. It seems worthwhile, but I first wanted to duplicate the basic functionality of his v9, and figured that mailmap support could be added in a follow-up patch. v4: http://article.gmane.org/gmane.comp.version-control.git/222439/ v6: http://thread.gmane.org/gmane.comp.version-control.git/224896/ >> +sub import_commits { >> + my ($commits) = @_; >> + return unless %$commits; >> + my $pid = open2 my $reader, my $writer, qw(git cat-file --batch); > > Hmph. > > I vaguely recall that people wanted not to use open2/IPC::Open2 in > other parts of the system. > > I think "cat-file --batch" is designed to behave on a regular bidi > pipe, as long as the caller strictly does a ping-pong of issuing one > request, flush (with an empty line) and always read the response, so > if open2 becomes problematic, we could switch to regular pipes. Checking the log, I see several cases where deprecated Python popen2 was removed but find nothing mentioning Perl. Git.pm, git-archimport.perl and git-cvsimport.perl appear to use open2. Switching to pipes is certainly an option. >> +sub get_blame { >> + my ($commits, $source, $start, $len, $from) = @_; >> + $len = 1 unless defined($len); >> + return if $len == 0; >> + open my $f, '-|', >> + qw(git blame --incremental -C -C), '-L', "$start,+$len", >> + '--since', $since, "$from^", '--', $source or die; > > "--incremental" is meant for consumers that wants better latency, > not necessarily better throughput, but I think this consumer does > not present incremental result to the end user, so perhaps you would > want to use "--porcelain" instead. This use of --incremental was copied literally from Felipe's v9, but after reading git-blame documentation, I agree that --porcelain makes sense. >> +sub scan_hunks { >> + my ($commits, $id, $f) = @_; >> + my $source; >> + while (<$f>) { >> + if (/^---\s+(\S+)/) { > > I wonder what happens to a patch that touches a file with SP in its > path with this regular expression. As it is fairly clear that you > are reading from format-patch output (the caller does not call this > function if it did not see $id), perhaps you can tighten the prefix > a bit more? I.e. something like: > > if (/^--- (.*)$/) Good idea. >> + } elsif (/^@@ -(\d+)(?:,(\d+))?/ && $source) { > > (mental note) For each hunk of a patch that is not a creation patch, > find the origin of the preimage. > >> + get_blame($commits, $source, $1, $2, $id); >> + } > > An major issue (*) and a few minor issues (-) from the above > observations: > > * A single patch may touch two or more paths. If the first one is > to modify an existing file, its path is assigned to $source. > Now, imagine that the second one is a creation patch. $source is > not set to undef but is kept, and the code ends up trying to run > blame on the first path with the range for the second path. > > Oops? > > This is one of the reasons why we shouldn't use statement > modifiers lightly. I think the above should be more like: > > if (/^--- (.*)$) { > $source = ($1 eq '/dev/null') ? undef : substr($1, 2); > } elsif ... This error is entirely mine. Felipe's script did it correctly. > - If the patch were prepared with a non-standard src/dst-prefix, > unconditional substr($1, 2) would call blame on a wrong (and > likely to be nonexistent) path without a useful diagnosis (the > invocation of "git blame" will likely die with "no such path > 'tailofpathname' in $id"). > > One way to make it more robust may be to do something like this: > > if (/^--- /) { > if (m{^--- (?:a/(.*)|/dev/null)$}) { > $source = ($1 eq '/dev/null') ? undef : $1; > } else { > die "Cannot parse the patch file:$_"; > } > } elsif ... The substr($1, 2) also bothered me, but I didn't immediately see a good solution. Aborting, as you suggest, seems reasonable. > - Often a single patch touches more than one ranges in the same > path. Depending on the size of the patch, it might be more > efficient to run a single blame for a range that covers all the > lines the patch touches while discarding irrelevant parts. A > longer term improvement may be to extend "git blame" so that it > can take more than one "-L n,m" ranges, but I think that is > outside of the scope of this patch. Sounds like a potentially good optimization for a future patch, though it's not clear what heuristic to use to decide when to combine the ranges for a single git-blame run. Extending git-blame to recognize multiple -L sounds even better, though definitely outside the scope of this series. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 1/4] contrib: add git-contacts helper 2013-07-02 8:17 ` Eric Sunshine @ 2013-07-02 18:32 ` Junio C Hamano 2013-07-02 19:04 ` Eric Sunshine 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2013-07-02 18:32 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List Eric Sunshine <sunshine@sunshineco.com> writes: >> The author name and email can be grabbed from the "blame" output >> without doing this (and the result may be more robust), but you >> would need to read from the log message anyway, so I think this is >> OK. >> >> Note that the names and emails in blame output are sanitized via the >> mailmap mechanism, but "cat-file commit" will certainly not be. > > Thanks for pointing this out. Grabbing the author name and email from > git-blame output does seem like a better approach. Well, that was not what I was suggesting. Reading from blame output will map only the author/committer names/addresses, and you have two choices: (1) if you read author/committer from blame output and other names from the commit object without applying mailmap, the same person can appear under different names, from his authorship (mapped name) and from his name on footers (unmapped), which would be inconsistent. By reading from "author" and "committer" header lines in the commit object, you will be at least consistently using unmapped names. (2) if you want to apply mailmap to names you collect by reading the footer, you will write the mapping logic yourself anyway, and at that point, passing the names you collect by reading the "author" and "committer" header lines in the commit object to the same logic will give you mapped names. At that point, you do not gain much by reading names from the blame output. So in either case, you would be better off not reading the author/committer from blame output, as long as you need to pick up other names from the commit object payload. >>> +sub import_commits { >>> + my ($commits) = @_; >>> + return unless %$commits; >>> + my $pid = open2 my $reader, my $writer, qw(git cat-file --batch); >> >> Hmph. >> >> I vaguely recall that people wanted not to use open2/IPC::Open2 in >> other parts of the system. >> >> I think "cat-file --batch" is designed to behave on a regular bidi >> pipe, as long as the caller strictly does a ping-pong of issuing one >> request, flush (with an empty line) and always read the response, so >> if open2 becomes problematic, we could switch to regular pipes. > > Checking the log, I see several cases where deprecated Python popen2 > was removed but find nothing mentioning Perl. Git.pm, > git-archimport.perl and git-cvsimport.perl appear to use open2. OK, then I was misremembering. Thanks for sanity checking. > ... Extending git-blame to recognize > multiple -L sounds even better, though definitely outside the scope of > this series. Yes. I might be able to find some time to do that myself later this week, but no promises ;-). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 1/4] contrib: add git-contacts helper 2013-07-02 18:32 ` Junio C Hamano @ 2013-07-02 19:04 ` Eric Sunshine 0 siblings, 0 replies; 15+ messages in thread From: Eric Sunshine @ 2013-07-02 19:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List On Tue, Jul 2, 2013 at 2:32 PM, Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > >>> The author name and email can be grabbed from the "blame" output >>> without doing this (and the result may be more robust), but you >>> would need to read from the log message anyway, so I think this is >>> OK. >>> >>> Note that the names and emails in blame output are sanitized via the >>> mailmap mechanism, but "cat-file commit" will certainly not be. >> >> Thanks for pointing this out. Grabbing the author name and email from >> git-blame output does seem like a better approach. > > Well, that was not what I was suggesting. Reading from blame output > will map only the author/committer names/addresses, and you have two > choices: > > (1) if you read author/committer from blame output and other names > from the commit object without applying mailmap, the same > person can appear under different names, from his authorship > (mapped name) and from his name on footers (unmapped), which > would be inconsistent. By reading from "author" and > "committer" header lines in the commit object, you will be at > least consistently using unmapped names. > > (2) if you want to apply mailmap to names you collect by reading > the footer, you will write the mapping logic yourself anyway, > and at that point, passing the names you collect by reading the > "author" and "committer" header lines in the commit object to > the same logic will give you mapped names. At that point, you > do not gain much by reading names from the blame output. > > So in either case, you would be better off not reading the > author/committer from blame output, as long as you need to pick up > other names from the commit object payload. Thanks for the clarification. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches 2013-06-30 11:08 [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 1/4] contrib: add git-contacts helper Eric Sunshine @ 2013-06-30 11:08 ` Eric Sunshine 2013-07-01 18:50 ` Junio C Hamano 2013-06-30 11:08 ` [PATCH/RFC 3/4] contrib: contacts: add ability to parse from committish Eric Sunshine ` (2 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Eric Sunshine @ 2013-06-30 11:08 UTC (permalink / raw) To: git; +Cc: Eric Sunshine Accept multiple patch files rather than only one. For example: % git contacts feature/*.patch Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> --- contrib/contacts/git-contacts | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/contrib/contacts/git-contacts b/contrib/contacts/git-contacts index 9007bae..ab11670 100755 --- a/contrib/contacts/git-contacts +++ b/contrib/contacts/git-contacts @@ -3,7 +3,7 @@ # List people who might be interested in a patch. Useful as the argument to # git-send-email --cc-cmd option, and in other situations. # -# Usage: git contacts <file> +# Usage: git contacts <file> ... use strict; use warnings; @@ -13,6 +13,7 @@ my $since = '5-years-ago'; my $min_percent = 10; my $labels_rx = qr/(?:Signed-off|Reviewed|Acked)-by/; my $id_rx = qr/[0-9a-f]{40}/i; +my %seen; sub format_contact { my ($name, $email) = @_; @@ -68,7 +69,9 @@ sub get_blame { while (<$f>) { if (/^$id_rx/o) { my $id = $&; - $commits->{$id} = { id => $id, contacts => {} }; + $commits->{$id} = { id => $id, contacts => {} } + unless $seen{$id}; + $seen{$id} = 1; } } close $f; @@ -93,6 +96,7 @@ sub commits_from_patch { while (<$f>) { if (/^From ($id_rx) /o) { $id = $1; + $seen{$id} = 1; last; } } @@ -100,10 +104,8 @@ sub commits_from_patch { close $f; } -exit 1 unless @ARGV == 1; - my %commits; -commits_from_patch(\%commits, $ARGV[0]); +commits_from_patch(\%commits, $_) for (@ARGV); import_commits(\%commits); my %count_per_person; -- 1.8.3.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches 2013-06-30 11:08 ` [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches Eric Sunshine @ 2013-07-01 18:50 ` Junio C Hamano 2013-07-01 18:57 ` Junio C Hamano 2013-07-02 8:52 ` Eric Sunshine 0 siblings, 2 replies; 15+ messages in thread From: Junio C Hamano @ 2013-07-01 18:50 UTC (permalink / raw) To: Eric Sunshine; +Cc: git Eric Sunshine <sunshine@sunshineco.com> writes: > Accept multiple patch files rather than only one. For example: > > % git contacts feature/*.patch > > Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> > @@ -93,6 +96,7 @@ sub commits_from_patch { > while (<$f>) { > if (/^From ($id_rx) /o) { > $id = $1; > + $seen{$id} = 1; > last; > } > } This looks less useful than it could be. $ git format-patch --stdout -4 >P.mbox $ git contacts P.mbox would have the same number of patches but in a single file. Wouldn't it be more useful to do something like $id = undef; while (<$f>) { if (/^From ([0-9a-f]{40}) Mon Sep 17 00:00:00 2001$/) { # beginning of a patch $id = $1; } next if (!defined $id); # inline the body of scan_hunks here... if (m{^--- (a/.*|/dev/null)$}) { $source = ... } elsif (/^@@ -(\d+)...) { get_blame(); } } > @@ -100,10 +104,8 @@ sub commits_from_patch { > close $f; > } > > -exit 1 unless @ARGV == 1; > - > my %commits; > -commits_from_patch(\%commits, $ARGV[0]); > +commits_from_patch(\%commits, $_) for (@ARGV); This change does not seem to account for an invocation without any argument. Perhaps write it like so to make it more readable? if (!@ARGV) { die "No input file?\n"; } for (@ARGV) { commits_from_patch(\%commits, $_); } > import_commits(\%commits); > > my %count_per_person; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches 2013-07-01 18:50 ` Junio C Hamano @ 2013-07-01 18:57 ` Junio C Hamano 2013-07-02 8:52 ` Eric Sunshine 1 sibling, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2013-07-01 18:57 UTC (permalink / raw) To: Eric Sunshine; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > while (<$f>) { > if (/^From ([0-9a-f]{40}) Mon Sep 17 00:00:00 2001$/) { > # beginning of a patch > $id = $1; > } > next if (!defined $id); > # inline the body of scan_hunks here... Or alternatively, teach scan_hunks to stop reading when it sees the beginning of the next patch (and probably you would need to return the $id you read, as it would be more cumbersome to "rewind" the input stream). > if (m{^--- (a/.*|/dev/null)$}) { > $source = ... > } elsif (/^@@ -(\d+)...) { > get_blame(); > } > } > >> @@ -100,10 +104,8 @@ sub commits_from_patch { >> close $f; >> } >> >> -exit 1 unless @ARGV == 1; >> - >> my %commits; >> -commits_from_patch(\%commits, $ARGV[0]); >> +commits_from_patch(\%commits, $_) for (@ARGV); > > This change does not seem to account for an invocation without any > argument. Perhaps write it like so to make it more readable? > > if (!@ARGV) { > die "No input file?\n"; > } > > for (@ARGV) { > commits_from_patch(\%commits, $_); > } > >> import_commits(\%commits); >> >> my %count_per_person; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches 2013-07-01 18:50 ` Junio C Hamano 2013-07-01 18:57 ` Junio C Hamano @ 2013-07-02 8:52 ` Eric Sunshine 1 sibling, 0 replies; 15+ messages in thread From: Eric Sunshine @ 2013-07-02 8:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List On Mon, Jul 1, 2013 at 2:50 PM, Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > >> Accept multiple patch files rather than only one. For example: >> >> % git contacts feature/*.patch >> >> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> > >> @@ -93,6 +96,7 @@ sub commits_from_patch { >> while (<$f>) { >> if (/^From ($id_rx) /o) { >> $id = $1; >> + $seen{$id} = 1; >> last; >> } >> } > > This looks less useful than it could be. > > $ git format-patch --stdout -4 >P.mbox > $ git contacts P.mbox > > would have the same number of patches but in a single file. > > Wouldn't it be more useful to do something like > > $id = undef; > while (<$f>) { > if (/^From ([0-9a-f]{40}) Mon Sep 17 00:00:00 2001$/) { > # beginning of a patch > $id = $1; > } > next if (!defined $id); > # inline the body of scan_hunks here... > if (m{^--- (a/.*|/dev/null)$}) { > $source = ... > } elsif (/^@@ -(\d+)...) { > get_blame(); > } > } Good point. It did not occur to me that a single file might contain multiple patches. Felipe's script handled this case correctly. >> @@ -100,10 +104,8 @@ sub commits_from_patch { >> close $f; >> } >> >> -exit 1 unless @ARGV == 1; >> - >> my %commits; >> -commits_from_patch(\%commits, $ARGV[0]); >> +commits_from_patch(\%commits, $_) for (@ARGV); > > This change does not seem to account for an invocation without any > argument. Perhaps write it like so to make it more readable? > > if (!@ARGV) { > die "No input file?\n"; > } > > for (@ARGV) { > commits_from_patch(\%commits, $_); > } Emitting a diagnostic about missing input makes sense. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH/RFC 3/4] contrib: contacts: add ability to parse from committish 2013-06-30 11:08 [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 1/4] contrib: add git-contacts helper Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches Eric Sunshine @ 2013-06-30 11:08 ` Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 4/4] contrib: contacts: interpret committish akin to format-patch Eric Sunshine 2013-07-01 17:00 ` [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Junio C Hamano 4 siblings, 0 replies; 15+ messages in thread From: Eric Sunshine @ 2013-06-30 11:08 UTC (permalink / raw) To: git; +Cc: Eric Sunshine Committishes can be mentioned along with patch files in the same invocation. For example: % git contacts master..feature extra/*.patch Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> --- contrib/contacts/git-contacts | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/contrib/contacts/git-contacts b/contrib/contacts/git-contacts index ab11670..abb90a1 100755 --- a/contrib/contacts/git-contacts +++ b/contrib/contacts/git-contacts @@ -3,7 +3,7 @@ # List people who might be interested in a patch. Useful as the argument to # git-send-email --cc-cmd option, and in other situations. # -# Usage: git contacts <file> ... +# Usage: git contacts <file | rev-list option> ... use strict; use warnings; @@ -104,8 +104,32 @@ sub commits_from_patch { close $f; } +sub commits_from_rev_args { + my ($commits, $args) = @_; + open my $f, '-|', qw(git rev-list --reverse), @$args or die; + while (<$f>) { + chomp; + my $id = $_; + $seen{$id} = 1; + open my $g, '-|', qw(git show -C --oneline), $id or die; + scan_hunks($commits, $id, $g); + close $g; + } + close $f; +} + +my (@files, @rev_args); +for (@ARGV) { + if (-e) { + push @files, $_; + } else { + push @rev_args, $_; + } +} + my %commits; -commits_from_patch(\%commits, $_) for (@ARGV); +commits_from_patch(\%commits, $_) for (@files); +commits_from_rev_args(\%commits, \@rev_args) if @rev_args; import_commits(\%commits); my %count_per_person; -- 1.8.3.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH/RFC 4/4] contrib: contacts: interpret committish akin to format-patch 2013-06-30 11:08 [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Eric Sunshine ` (2 preceding siblings ...) 2013-06-30 11:08 ` [PATCH/RFC 3/4] contrib: contacts: add ability to parse from committish Eric Sunshine @ 2013-06-30 11:08 ` Eric Sunshine 2013-07-01 17:00 ` [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Junio C Hamano 4 siblings, 0 replies; 15+ messages in thread From: Eric Sunshine @ 2013-06-30 11:08 UTC (permalink / raw) To: git; +Cc: Eric Sunshine As a convenience, accept the same style committish as accepted by git-format-patch. For example: % git contacts master will consider commits in the current branch built atop 'master', just as "git format-patch master" will format commits built atop 'master'. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> --- contrib/contacts/git-contacts | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/contrib/contacts/git-contacts b/contrib/contacts/git-contacts index abb90a1..10d77d3 100755 --- a/contrib/contacts/git-contacts +++ b/contrib/contacts/git-contacts @@ -104,9 +104,26 @@ sub commits_from_patch { close $f; } +sub parse_rev_args { + my @args = @_; + open my $f, '-|', + qw(git rev-parse --revs-only --default HEAD --symbolic), @args + or die; + my @revs; + while (<$f>) { + chomp; + push @revs, $_; + } + close $f; + return @revs if scalar(@revs) != 1; + return "^$revs[0]", 'HEAD' unless $revs[0] =~ /^-/; + return $revs[0], 'HEAD'; +} + sub commits_from_rev_args { my ($commits, $args) = @_; - open my $f, '-|', qw(git rev-list --reverse), @$args or die; + my @revs = parse_rev_args(@$args); + open my $f, '-|', qw(git rev-list --reverse), @revs or die; while (<$f>) { chomp; my $id = $_; -- 1.8.3.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 0/4] Perl rewrite of Ruby git-related 2013-06-30 11:08 [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Eric Sunshine ` (3 preceding siblings ...) 2013-06-30 11:08 ` [PATCH/RFC 4/4] contrib: contacts: interpret committish akin to format-patch Eric Sunshine @ 2013-07-01 17:00 ` Junio C Hamano 2013-07-02 9:01 ` Eric Sunshine 4 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2013-07-01 17:00 UTC (permalink / raw) To: Eric Sunshine; +Cc: git Eric Sunshine <sunshine@sunshineco.com> writes: > In this submission, the command name has changed to git-contacts since > git-related felt too generic. (git-contacts seemed best of several > possibilities I surveyed: git-people, git-interested, git-mentioned, > git-blame-us.) I admit I am pretty bad at naming, but "contacts" sounds like the most sensible name for what it wants to do (blame-us sounds cute to my ears, though ;-). > No attempt is made to answer Junio's v9 review[5], as I lack sufficient > insight with '-C' options to be able to respond properly. I just wanted to see if we want to allow the end user of this script to specify what -C level they want the underlying blame to use, or just a hardcoded one should suffice (and if so an explanation why). > My Perl may be rusty and idiomatic usage may be absent. That is OK. We need to start somewhere. Thanks. Folks, please discuss ;-). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH/RFC 0/4] Perl rewrite of Ruby git-related 2013-07-01 17:00 ` [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Junio C Hamano @ 2013-07-02 9:01 ` Eric Sunshine 0 siblings, 0 replies; 15+ messages in thread From: Eric Sunshine @ 2013-07-02 9:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List On Mon, Jul 1, 2013 at 1:00 PM, Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > >> No attempt is made to answer Junio's v9 review[5], as I lack sufficient >> insight with '-C' options to be able to respond properly. > > I just wanted to see if we want to allow the end user of this script > to specify what -C level they want the underlying blame to use, or > just a hardcoded one should suffice (and if so an explanation why). It might indeed make sense to allow the user control over the -C level, just as the user likely should have control over other assumptions made by the script (to wit: $since = '5-years-ago'; $min_percent = 10). More fodder for future patches. Now that I've read up more carefully on the -C option, I too wonder why git-blame and git-show are invoked with different -C levels. At the very least, I think they should be unified in this initial patch to either one or two -C's. Since the script already is quite slow, I'm leaning toward just one -C. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-07-02 19:04 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-06-30 11:08 [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 1/4] contrib: add git-contacts helper Eric Sunshine 2013-07-01 18:39 ` Junio C Hamano 2013-07-01 19:17 ` Junio C Hamano 2013-07-02 8:17 ` Eric Sunshine 2013-07-02 18:32 ` Junio C Hamano 2013-07-02 19:04 ` Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches Eric Sunshine 2013-07-01 18:50 ` Junio C Hamano 2013-07-01 18:57 ` Junio C Hamano 2013-07-02 8:52 ` Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 3/4] contrib: contacts: add ability to parse from committish Eric Sunshine 2013-06-30 11:08 ` [PATCH/RFC 4/4] contrib: contacts: interpret committish akin to format-patch Eric Sunshine 2013-07-01 17:00 ` [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Junio C Hamano 2013-07-02 9:01 ` Eric Sunshine
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).