From: Eric Sunshine <sunshine@sunshineco.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH/RFC 1/4] contrib: add git-contacts helper
Date: Tue, 2 Jul 2013 04:17:14 -0400 [thread overview]
Message-ID: <CAPig+cRbBGqrXj-Anib1ESdBBbdUGM-9um4XoPcwG2QxJBubuA@mail.gmail.com> (raw)
In-Reply-To: <7vbo6mgm5e.fsf@alter.siamese.dyndns.org>
On Mon, Jul 1, 2013 at 2:39 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
>> diff --git a/contrib/contacts/git-contacts b/contrib/contacts/git-contacts
>> new file mode 100755
>> index 0000000..9007bae
>> --- /dev/null
>> +++ b/contrib/contacts/git-contacts
>> @@ -0,0 +1,121 @@
>> +#!/usr/bin/perl
>> +
>> +# List people who might be interested in a patch. Useful as the argument to
>> +# git-send-email --cc-cmd option, and in other situations.
>> +#
>> +# Usage: git contacts <file>
>> +
>> +use strict;
>> +use warnings;
>> +use IPC::Open2;
>> +
>> +my $since = '5-years-ago';
>> +my $min_percent = 10;
>> +my $labels_rx = qr/(?:Signed-off|Reviewed|Acked)-by/;
>
> Although I personally do not see particuarly a good reason to do so,
> I have seen people add "Cc:" on the footers, and I suspect they may
> expect them to be also picked up as "relevant parties" to the
> change. Also S-o-b is often misspelled as "Signed-Off-By", so you
> may want to do qr//i this one.
I originally used /i but bogusly discarded it due to too narrow
thinking. Will re-add.
Agreed about adding Cc:.
>> +my $id_rx = qr/[0-9a-f]{40}/i;
>
> On the other hand, we always mark the "this is a format-patch
> output" marker lines with lowercase hex, so you would want to lose
> 'i' from here.
Indeed, all the SHA-1's matched by $id_rx in the script are
git-generated and thus lowercase hex. Will change.
> And you probably want to tighten it even more, perhaps
> like so:
>
> qr/^From ([0-9a-f]{40}) Mon Sep 17 00:00:00 2001$/
>
> Of course, you wuold need to have a separate regular expression to
> parse "git blame --incremental/--porcelain" output that may read
> perhaps like so:
>
> qr/^([0-9a-f]{40})[ *](\d) (\d) (\d)$/
>
> to pick up only the group header. The last \d is the number of
> lines that came from this guilty party, and it might become useful
> if we want to give weight to people based on line-count, not just
> number of commits.
Can do.
>> +sub format_contact {
>> + my ($name, $email) = @_;
>> + return "$name <$email>";
>> +}
>> +
>> +sub parse_commit {
>> + my ($commit, $data) = @_;
>> + my $contacts = $commit->{contacts};
>> + my $inbody = 0;
>> + for (split(/^/m, $data)) {
>> + if (not $inbody) {
>> + if (/^author ([^<>]+) <(\S+)> .+$/) {
>> + $contacts->{format_contact($1, $2)} = 1;
>
> The author name and email can be grabbed from the "blame" output
> without doing this (and the result may be more robust), but you
> would need to read from the log message anyway, so I think this is
> OK.
>
> Note that the names and emails in blame output are sanitized via the
> mailmap mechanism, but "cat-file commit" will certainly not be.
Thanks for pointing this out. Grabbing the author name and email from
git-blame output does seem like a better approach.
> You may have to do the mapping yourself by reading the mailmap for
> the names and addresses you read from S-o-b: and friends.
Felipe did introduce mailmap support in v4 but dropped it at about v6.
It seems worthwhile, but I first wanted to duplicate the basic
functionality of his v9, and figured that mailmap support could be
added in a follow-up patch.
v4: http://article.gmane.org/gmane.comp.version-control.git/222439/
v6: http://thread.gmane.org/gmane.comp.version-control.git/224896/
>> +sub import_commits {
>> + my ($commits) = @_;
>> + return unless %$commits;
>> + my $pid = open2 my $reader, my $writer, qw(git cat-file --batch);
>
> Hmph.
>
> I vaguely recall that people wanted not to use open2/IPC::Open2 in
> other parts of the system.
>
> I think "cat-file --batch" is designed to behave on a regular bidi
> pipe, as long as the caller strictly does a ping-pong of issuing one
> request, flush (with an empty line) and always read the response, so
> if open2 becomes problematic, we could switch to regular pipes.
Checking the log, I see several cases where deprecated Python popen2
was removed but find nothing mentioning Perl. Git.pm,
git-archimport.perl and git-cvsimport.perl appear to use open2.
Switching to pipes is certainly an option.
>> +sub get_blame {
>> + my ($commits, $source, $start, $len, $from) = @_;
>> + $len = 1 unless defined($len);
>> + return if $len == 0;
>> + open my $f, '-|',
>> + qw(git blame --incremental -C -C), '-L', "$start,+$len",
>> + '--since', $since, "$from^", '--', $source or die;
>
> "--incremental" is meant for consumers that wants better latency,
> not necessarily better throughput, but I think this consumer does
> not present incremental result to the end user, so perhaps you would
> want to use "--porcelain" instead.
This use of --incremental was copied literally from Felipe's v9, but
after reading git-blame documentation, I agree that --porcelain makes
sense.
>> +sub scan_hunks {
>> + my ($commits, $id, $f) = @_;
>> + my $source;
>> + while (<$f>) {
>> + if (/^---\s+(\S+)/) {
>
> I wonder what happens to a patch that touches a file with SP in its
> path with this regular expression. As it is fairly clear that you
> are reading from format-patch output (the caller does not call this
> function if it did not see $id), perhaps you can tighten the prefix
> a bit more? I.e. something like:
>
> if (/^--- (.*)$/)
Good idea.
>> + } elsif (/^@@ -(\d+)(?:,(\d+))?/ && $source) {
>
> (mental note) For each hunk of a patch that is not a creation patch,
> find the origin of the preimage.
>
>> + get_blame($commits, $source, $1, $2, $id);
>> + }
>
> An major issue (*) and a few minor issues (-) from the above
> observations:
>
> * A single patch may touch two or more paths. If the first one is
> to modify an existing file, its path is assigned to $source.
> Now, imagine that the second one is a creation patch. $source is
> not set to undef but is kept, and the code ends up trying to run
> blame on the first path with the range for the second path.
>
> Oops?
>
> This is one of the reasons why we shouldn't use statement
> modifiers lightly. I think the above should be more like:
>
> if (/^--- (.*)$) {
> $source = ($1 eq '/dev/null') ? undef : substr($1, 2);
> } elsif ...
This error is entirely mine. Felipe's script did it correctly.
> - If the patch were prepared with a non-standard src/dst-prefix,
> unconditional substr($1, 2) would call blame on a wrong (and
> likely to be nonexistent) path without a useful diagnosis (the
> invocation of "git blame" will likely die with "no such path
> 'tailofpathname' in $id").
>
> One way to make it more robust may be to do something like this:
>
> if (/^--- /) {
> if (m{^--- (?:a/(.*)|/dev/null)$}) {
> $source = ($1 eq '/dev/null') ? undef : $1;
> } else {
> die "Cannot parse the patch file:$_";
> }
> } elsif ...
The substr($1, 2) also bothered me, but I didn't immediately see a
good solution. Aborting, as you suggest, seems reasonable.
> - Often a single patch touches more than one ranges in the same
> path. Depending on the size of the patch, it might be more
> efficient to run a single blame for a range that covers all the
> lines the patch touches while discarding irrelevant parts. A
> longer term improvement may be to extend "git blame" so that it
> can take more than one "-L n,m" ranges, but I think that is
> outside of the scope of this patch.
Sounds like a potentially good optimization for a future patch, though
it's not clear what heuristic to use to decide when to combine the
ranges for a single git-blame run. Extending git-blame to recognize
multiple -L sounds even better, though definitely outside the scope of
this series.
next prev parent reply other threads:[~2013-07-02 8:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-30 11:08 [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Eric Sunshine
2013-06-30 11:08 ` [PATCH/RFC 1/4] contrib: add git-contacts helper Eric Sunshine
2013-07-01 18:39 ` Junio C Hamano
2013-07-01 19:17 ` Junio C Hamano
2013-07-02 8:17 ` Eric Sunshine [this message]
2013-07-02 18:32 ` Junio C Hamano
2013-07-02 19:04 ` Eric Sunshine
2013-06-30 11:08 ` [PATCH/RFC 2/4] contrib: contacts: add support for multiple patches Eric Sunshine
2013-07-01 18:50 ` Junio C Hamano
2013-07-01 18:57 ` Junio C Hamano
2013-07-02 8:52 ` Eric Sunshine
2013-06-30 11:08 ` [PATCH/RFC 3/4] contrib: contacts: add ability to parse from committish Eric Sunshine
2013-06-30 11:08 ` [PATCH/RFC 4/4] contrib: contacts: interpret committish akin to format-patch Eric Sunshine
2013-07-01 17:00 ` [PATCH/RFC 0/4] Perl rewrite of Ruby git-related Junio C Hamano
2013-07-02 9:01 ` Eric Sunshine
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPig+cRbBGqrXj-Anib1ESdBBbdUGM-9um4XoPcwG2QxJBubuA@mail.gmail.com \
--to=sunshine@sunshineco.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).